[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-10 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13585#discussion_r66575743
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -92,6 +92,36 @@ object PhysicalOperation extends PredicateHelper {
   .map(Alias(_, a.name)(a.exprId, a.qualifier, isGenerated = 
a.isGenerated)).getOrElse(a)
 }
   }
+
+  /**
+   * Drop the non-partition key expression in the disjunctions, to 
optimize the partition pruning.
+   * For instances: (We assume part1 & part2 are the partition keys)
+   * (part1 == 1 and a > 3) or (part2 == 2 and a < 5)  ==> (part1 == 1 or 
part1 == 2)
+   * (part1 == 1 and a > 3) or (a < 100) => None
+   * (a > 100 && b < 100) or (part1 = 10) => None
+   * (a > 100 && b < 100 and part1 = 10) or (part1 == 2) => (part1 = 10 or 
part1 == 2)
+   * @param predicate disjunctions
+   * @param partitionKeyIds partition keys in attribute set
+   * @return
+   */
+  def partitionPrunningFromDisjunction(
+predicate: Expression, partitionKeyIds: AttributeSet): 
Option[Expression] = {
+// ignore the pure non-partition key expression in conjunction of the 
expression tree
+val additionalPartPredicate = predicate transformUp {
+  case a @ And(left, right) if a.deterministic &&
+left.references.intersect(partitionKeyIds).isEmpty => right
+  case a @ And(left, right) if a.deterministic &&
+right.references.intersect(partitionKeyIds).isEmpty => left
--- End diff --

Great point @clockfly , but maybe the optimizer will turn this expression 
to (!(partition = 1) || !(a > 3)) ?

[BooleanSimplification](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L907)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-09 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13585#discussion_r66564745
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala ---
@@ -65,4 +69,95 @@ class QueryPartitionSuite extends QueryTest with 
SQLTestUtils with TestHiveSingl
   sql("DROP TABLE IF EXISTS createAndInsertTest")
 }
   }
+
+  test("partition pruning in disjunction") {
+withSQLConf((SQLConf.HIVE_VERIFY_PARTITION_PATH.key, "true")) {
+  val testData = sparkContext.parallelize(
+(1 to 10).map(i => TestData(i, i.toString))).toDF()
+  testData.registerTempTable("testData")
+
+  val testData2 = sparkContext.parallelize(
+(11 to 20).map(i => TestData(i, i.toString))).toDF()
+  testData2.registerTempTable("testData2")
+
+  val testData3 = sparkContext.parallelize(
+(21 to 30).map(i => TestData(i, i.toString))).toDF()
+  testData3.registerTempTable("testData3")
+
+  val testData4 = sparkContext.parallelize(
+(31 to 40).map(i => TestData(i, i.toString))).toDF()
+  testData4.registerTempTable("testData4")
+
+  val tmpDir = Files.createTempDir()
+  // create the table for test
+  sql(s"CREATE TABLE table_with_partition(key int,value string) " +
+s"PARTITIONED by (ds string, ds2 string) location 
'${tmpDir.toURI.toString}' ")
+  sql("INSERT OVERWRITE TABLE table_with_partition  partition (ds='1', 
ds2='d1') " +
+"SELECT key,value FROM testData")
+  sql("INSERT OVERWRITE TABLE table_with_partition  partition (ds='2', 
ds2='d1') " +
+"SELECT key,value FROM testData2")
+  sql("INSERT OVERWRITE TABLE table_with_partition  partition (ds='3', 
ds2='d3') " +
+"SELECT key,value FROM testData3")
+  sql("INSERT OVERWRITE TABLE table_with_partition  partition (ds='4', 
ds2='d4') " +
+"SELECT key,value FROM testData4")
+
+  checkAnswer(sql("select key,value from table_with_partition"),
+testData.collect ++ testData2.collect ++ testData3.collect ++ 
testData4.collect)
+
+  checkAnswer(
+sql(
+  """select key,value from table_with_partition
+| where (ds='4' and key=38) or (ds='3' and 
key=22)""".stripMargin),
+  Row(38, "38") :: Row(22, "22") :: Nil)
+
+  checkAnswer(
+sql(
+  """select key,value from table_with_partition
+| where (key<40 and key>38) or (ds='3' and 
key=22)""".stripMargin),
+Row(39, "39") :: Row(22, "22") :: Nil)
+
+  sql("DROP TABLE table_with_partition")
+  sql("DROP TABLE createAndInsertTest")
--- End diff --

Not really sure why we should drop "createAndInsertTest", I can find it 
anywhere. Maybe those temp tables named  "testData*" are the ones should be 
dropped. ^_^


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-09 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13585#discussion_r66563744
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -92,6 +92,36 @@ object PhysicalOperation extends PredicateHelper {
   .map(Alias(_, a.name)(a.exprId, a.qualifier, isGenerated = 
a.isGenerated)).getOrElse(a)
 }
   }
+
+  /**
+   * Drop the non-partition key expression in the disjunctions, to 
optimize the partition pruning.
--- End diff --

"Drop the non-partition key expression in the disjunctions". Should it be 
"conjunctions"? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13522: [SPARK-14321][SQL] Reduce date format cost and st...

2016-06-06 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13522#discussion_r65899067
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -435,20 +437,23 @@ abstract class UnixTime extends BinaryExpression with 
ExpectsInputTypes {
   case StringType if right.foldable =>
 val sdf = classOf[SimpleDateFormat].getName
 val fString = if (constFormat == null) null else 
constFormat.toString
-val formatter = ctx.freshName("formatter")
 if (fString == null) {
   ev.copy(code = s"""
 boolean ${ev.isNull} = true;
 ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};""")
 } else {
+  val formatter = ctx.freshName("formatter")
+  ctx.addMutableState(sdf, formatter, s"""$formatter = null;""")
--- End diff --

Not very familiar with codegen, but I wonder if we can add the 
instantiation here and avoid the null checking below.
ctx.addMutableState(sdf, formatter, s"""$formatter = new 
$sdf("$fString");""")


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13522: [SPARK-14321][SQL] Reduce date format cost and st...

2016-06-06 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13522#discussion_r65898385
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -554,14 +561,19 @@ case class FromUnixTime(sec: Expression, format: 
Expression)
   boolean ${ev.isNull} = true;
   ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};""")
   } else {
+val sdfTerm = ctx.freshName("formatter")
--- End diff --

This is trivial but why use a different variable name here from the above 
one? (which is called "formatter")


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...

2016-06-04 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13486#discussion_r65799660
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -350,6 +350,10 @@ private[sql] object PartitioningUtils {
 case _ => throw new AnalysisException(s"Cannot use 
${field.dataType} for partition column")
   }
 }
+
+if (partitionColumns.size == schema.fields.size) {
+  throw new AnalysisException(s"Cannot use all columns for partition 
columns")
+}
   }
--- End diff --

Yeah, I think it's better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...

2016-06-04 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13486#discussion_r65799422
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -350,6 +350,10 @@ private[sql] object PartitioningUtils {
 case _ => throw new AnalysisException(s"Cannot use 
${field.dataType} for partition column")
   }
 }
+
+if (partitionColumns.size == schema.fields.size) {
+  throw new AnalysisException(s"Cannot use all columns for partition 
columns")
+}
   }
--- End diff --

One little concern. If it is added here, should the method name be changed? 
After all it will do more than validating data types after the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...

2016-05-24 Thread wangyang1992
Github user wangyang1992 commented on the pull request:

https://github.com/apache/spark/pull/13177#issuecomment-221457800
  
Thanks @rxin . Added it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...

2016-05-24 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13177#discussion_r64335588
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -480,11 +480,21 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 try {
   Option(hive.getFunction(db, name)).map(fromHiveFunction)
 } catch {
-  case CausedBy(ex: NoSuchObjectException) if 
ex.getMessage.contains(name) =>
+  case e: Throwable if isCausedBy(e, s"$name does not exist") =>
--- End diff --

@andrewor14 thanks. Changed to NonFatal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...

2016-05-24 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13177#discussion_r64334408
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -480,11 +480,21 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 try {
   Option(hive.getFunction(db, name)).map(fromHiveFunction)
 } catch {
-  case CausedBy(ex: NoSuchObjectException) if 
ex.getMessage.contains(name) =>
+  case e: Throwable if isCausedBy(e, s"$name does not exist") =>
--- End diff --

@andrewor14 will this work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...

2016-05-23 Thread wangyang1992
Github user wangyang1992 commented on the pull request:

https://github.com/apache/spark/pull/13177#issuecomment-221147177
  
Hi @andrewor14, I have checked out the CausedBy source code, I think it 
will return the root cause of the Exception being thrown not the first 
Exception. 
I copied the CausedBy source code and created a notebook. 
(https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/7973071962862063/390461470296902/58107563000366/latest.html)
Would you please go over it sometime? If it is the situation you are 
worried about, I think we can catch it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...

2016-05-22 Thread wangyang1992
Github user wangyang1992 commented on the pull request:

https://github.com/apache/spark/pull/13177#issuecomment-220880826
  
Hi @andrewor14,  sorry to bother you, but does this pr need to be further 
refined, or it is ready to merge? Could you please give me some instructions? 
Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date

2016-05-22 Thread wangyang1992
Github user wangyang1992 commented on the pull request:

https://github.com/apache/spark/pull/13169#issuecomment-220834946
  
@cloud-fan Failed on some unrelated cases too, can you help me retest it 
again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date

2016-05-22 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13169#discussion_r64150440
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 ---
@@ -353,6 +353,20 @@ class DateTimeUtilsSuite extends SparkFunSuite {
 c.getTimeInMillis * 1000 + 123456)
   }
 
+  test("SPARK-15379: special invalid date string") {
+// Test stringToDate
+assert(stringToDate(
+  UTF8String.fromString("2015-02-29 00:00:00")).isEmpty)
--- End diff --

Added tests against date strings.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date

2016-05-20 Thread wangyang1992
Github user wangyang1992 commented on the pull request:

https://github.com/apache/spark/pull/13169#issuecomment-220556056
  
Retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date

2016-05-20 Thread wangyang1992
Github user wangyang1992 commented on the pull request:

https://github.com/apache/spark/pull/13169#issuecomment-220549615
  
seems like a irrelevant fail. retest it please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date

2016-05-20 Thread wangyang1992
Github user wangyang1992 commented on the pull request:

https://github.com/apache/spark/pull/13169#issuecomment-220531075
  
Fixed scala style, retest it please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date

2016-05-19 Thread wangyang1992
Github user wangyang1992 commented on the pull request:

https://github.com/apache/spark/pull/13169#issuecomment-220523060
  
Addressed your comments. @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date

2016-05-19 Thread wangyang1992
Github user wangyang1992 commented on the pull request:

https://github.com/apache/spark/pull/13169#issuecomment-220512727
  
@cloud-fan  Could you please help me look at this some time? A simple fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...

2016-05-19 Thread wangyang1992
Github user wangyang1992 commented on the pull request:

https://github.com/apache/spark/pull/13177#issuecomment-220501798
  
@andrewor14 Thanks :-).  Do I still need to modify the code? Frankly, I 
don't really understand your comment above.  ("this won't actually work because 
it'll find the first exception it sees and tries to match the message. You'll 
need to do this recursively and match all the messages in the exception stack")


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...

2016-05-18 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13177#discussion_r63815687
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -480,7 +480,7 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 try {
   Option(hive.getFunction(db, name)).map(fromHiveFunction)
 } catch {
-  case CausedBy(ex: NoSuchObjectException) if 
ex.getMessage.contains(name) =>
+  case CausedBy(ex: Exception) if ex.getMessage.contains(s"$name does 
not exist") =>
--- End diff --

The objective here is not to catch all the exceptions but the ones caused 
by the function not existing. In my case, this exception is 
"org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:NoSuchObjectException(message:Function default.func does 
not exist))" whose root cause is MetaException, but it may vary in different 
situations (not really sure it varies, just conjecture based on previous code. 
See pr #12198 and #12853).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date

2016-05-18 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13169#discussion_r63814998
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -58,6 +58,7 @@ object DateTimeUtils {
   final val YearZero = -17999
   final val toYearZero = to2001 + 7304850
   final val TimeZoneGMT = TimeZone.getTimeZone("GMT")
+  final val MonthOf31Days = Set(1,3,5,7,8,10,12)
--- End diff --

Indentation fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...

2016-05-18 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13177#discussion_r63773414
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -480,7 +480,7 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 try {
   Option(hive.getFunction(db, name)).map(fromHiveFunction)
 } catch {
-  case CausedBy(ex: NoSuchObjectException) if 
ex.getMessage.contains(name) =>
+  case CausedBy(ex: Exception) if ex.getMessage.contains(s"$name does 
not exist") =>
--- End diff --

@andrewor14 Maybe it is safer this way. What do you think? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...

2016-05-18 Thread wangyang1992
GitHub user wangyang1992 opened a pull request:

https://github.com/apache/spark/pull/13177

[SPARK-15388][SQL] Fix spark sql CREATE FUNCTION using hive 1.2.1

## What changes were proposed in this pull request?

spark.sql("CREATE FUNCTION myfunc AS 'com.haizhi.bdp.udf.UDFGetGeoCode'") 
throws 
"org.apache.hadoop.hive.ql.metadata.HiveException:MetaException(message:NoSuchObjectException(message:Function
 default.myfunc does not exist))" using hive 1.2.1.

I think it is introduced by pr #12853. Fixing it by catching Exception (not 
NoSuchObjectException) and string matching.

## How was this patch tested?

added a unit test and also tested it manually




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyang1992/spark fixCreateFunc2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13177.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13177


commit 08435b91b07a9f9aebde493aeec5725e28756ea7
Author: wangyang <wangy...@haizhi.com>
Date:   2016-05-18T19:56:19Z

fix create table with hive 1.2.1




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14414][SQL] Make DDL exceptions more co...

2016-05-18 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12853#discussion_r63763111
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -616,7 +619,8 @@ private[hive] class HiveClientImpl(
 try {
   Option(client.getFunction(db, name)).map(fromHiveFunction)
 } catch {
-  case he: HiveException => None
+  case CausedBy(ex: NoSuchObjectException) if 
ex.getMessage.contains(name) =>
--- End diff --

In my case, the Exception thrown is 
"org.apache.hadoop.hive.ql.metadata.HiveException:MetaException(message:NoSuchObjectException(message:Function
 default.myfunc does not exist))", but it turns out that the root case of this 
exception is MetaException whose message is 
"NoSuchObjectException(message:Function default.myfunc does not exist))", thus 
the exception is not caught. (I run into this problem when I use "CREATE 
FUNCTION" using spark sql with hive)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date

2016-05-18 Thread wangyang1992
GitHub user wangyang1992 opened a pull request:

https://github.com/apache/spark/pull/13169

[SPARK-15379][SQL] check special invalid date

## What changes were proposed in this pull request?

When invalid date string like "2015-02-29 00:00:00" are cast as date or 
timestamp using spark sql, it used to not return null but another valid date 
(2015-03-01 in this case). 
In this pr, invalid date string like "2016-02-29" and "2016-04-31" are 
returned as null when cast as date or timestamp.

## How was this patch tested?

Unit tests are added.


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyang1992/spark invalid_date

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13169.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13169


commit ef73d79bacc2eab8cbead7aa8991b4ec7de3b862
Author: wangyang <wangy...@haizhi.com>
Date:   2016-05-18T10:04:14Z

check special invalid date




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13934][SQL] fixed table identifier

2016-03-24 Thread wangyang1992
Github user wangyang1992 commented on the pull request:

https://github.com/apache/spark/pull/11929#issuecomment-200703478
  
In my case, my application processes lots of auto-generated table 
identifiers. Some of them use backticks and some of them do not. If we upgrade 
to 1.6.1 without fixing this issue, the existing code will break and we have to 
check whether table identifier using backticks all over the place(If the 
identifier already using backticks, we cannot add it again.). That means 
changing a lot of code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13934][SQL] fixed table identifier

2016-03-24 Thread wangyang1992
Github user wangyang1992 commented on the pull request:

https://github.com/apache/spark/pull/11929#issuecomment-200694207
  
BTW, I cannot reproduce this problem in master. I pushed this pr in case 
there will be another release in this branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13934][SQL] fixed table identifier

2016-03-23 Thread wangyang1992
GitHub user wangyang1992 opened a pull request:

https://github.com/apache/spark/pull/11929

[SPARK-13934][SQL] fixed table identifier

## What changes were proposed in this pull request?
Table identifier that starts in a form of scientific notation (like 1e34) 
will throw an exception.
val tableName = "1e34abcd"
hc.sql("select 123").registerTempTable(tableName)
hc.dropTempTable(tableName)
The last line will throw a RuntimeException.(java.lang.RuntimeException: 
[1.1] failure: identifier expected)

Fix this by changing the scientific notation parser. If a scientific 
notation is followed by one or more identifier char, then don't see it as a 
valid token.

## How was this patch tested?

Unit test is added.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyang1992/spark branch-1.6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11929.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11929


commit 81287d31648b229bd3e617ef9ebce985fb54dca0
Author: wangyang <wangy...@haizhi.com>
Date:   2016-03-24T04:30:27Z

fixed table identifier




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13934][SQL] Fixed table name parsing

2016-03-20 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/11762#discussion_r56349547
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/CatalystQlSuite.scala
 ---
@@ -171,6 +171,7 @@ class CatalystQlSuite extends PlanTest {
   test("table identifier") {
 assert(TableIdentifier("q") === parser.parseTableIdentifier("q"))
 assert(TableIdentifier("q", Some("d")) === 
parser.parseTableIdentifier("d.q"))
+assert(TableIdentifier("104e4d676bac4d9aa3856f00b5b9f51c") === 
parser.parseTableIdentifier("104e4d676bac4d9aa3856f00b5b9f51c"))
--- End diff --

Yeah, I cannot reproduce this problem in master. I'm closing this pr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13934][SQL] Fixed table name parsing

2016-03-19 Thread wangyang1992
GitHub user wangyang1992 opened a pull request:

https://github.com/apache/spark/pull/11762

[SPARK-13934][SQL] Fixed table name parsing

## What changes were proposed in this pull request?
val tableName = "1e34abcd"
hc.sql("select 123").registerTempTable(tableName)
hc.dropTempTable(tableName)
The last line will throw a RuntimeException.(java.lang.RuntimeException: 
[1.1] failure: identifier expected)

Fix this by changing the scientific notation parser. If a scientific 
notation is followed by one or more identifier char, then don't see it as a 
valid token.

## How was this patch tested?

unit test is added 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyang1992/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11762.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11762


commit b4ea5b5025208acfa55d8cb0f57a2d36f4391653
Author: wangyang <wangy...@haizhi.com>
Date:   2016-03-16T12:54:38Z

Fixed table name parsing




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13934][SQL] Fixed table name parsing

2016-03-19 Thread wangyang1992
Github user wangyang1992 closed the pull request at:

https://github.com/apache/spark/pull/11762


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...

2016-01-30 Thread wangyang1992
GitHub user wangyang1992 opened a pull request:

https://github.com/apache/spark/pull/10994

[SPARK-13100] [SQL] improving the performance of stringToDate method in 
DateTimeUtils.scala

Using an instance variable to hold an GMT TimeZone object instead of 
instantiate it every time.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyang1992/spark datetimeUtil

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10994.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10994


commit 19defc9c83da6206288c7ee70ce97f2e08603f72
Author: wangyang <wangy...@haizhi.com>
Date:   2016-01-30T08:33:40Z

improving the performance of stringToDate method in DateTimeUtils.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...

2016-01-30 Thread wangyang1992
Github user wangyang1992 commented on the pull request:

https://github.com/apache/spark/pull/10994#issuecomment-177151828
  
@srowen No, just that one in this file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org