[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66575743 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -92,6 +92,36 @@ object PhysicalOperation extends PredicateHelper { .map(Alias(_, a.name)(a.exprId, a.qualifier, isGenerated = a.isGenerated)).getOrElse(a) } } + + /** + * Drop the non-partition key expression in the disjunctions, to optimize the partition pruning. + * For instances: (We assume part1 & part2 are the partition keys) + * (part1 == 1 and a > 3) or (part2 == 2 and a < 5) ==> (part1 == 1 or part1 == 2) + * (part1 == 1 and a > 3) or (a < 100) => None + * (a > 100 && b < 100) or (part1 = 10) => None + * (a > 100 && b < 100 and part1 = 10) or (part1 == 2) => (part1 = 10 or part1 == 2) + * @param predicate disjunctions + * @param partitionKeyIds partition keys in attribute set + * @return + */ + def partitionPrunningFromDisjunction( +predicate: Expression, partitionKeyIds: AttributeSet): Option[Expression] = { +// ignore the pure non-partition key expression in conjunction of the expression tree +val additionalPartPredicate = predicate transformUp { + case a @ And(left, right) if a.deterministic && +left.references.intersect(partitionKeyIds).isEmpty => right + case a @ And(left, right) if a.deterministic && +right.references.intersect(partitionKeyIds).isEmpty => left --- End diff -- Great point @clockfly , but maybe the optimizer will turn this expression to (!(partition = 1) || !(a > 3)) ? [BooleanSimplification](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L907) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66564745 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala --- @@ -65,4 +69,95 @@ class QueryPartitionSuite extends QueryTest with SQLTestUtils with TestHiveSingl sql("DROP TABLE IF EXISTS createAndInsertTest") } } + + test("partition pruning in disjunction") { +withSQLConf((SQLConf.HIVE_VERIFY_PARTITION_PATH.key, "true")) { + val testData = sparkContext.parallelize( +(1 to 10).map(i => TestData(i, i.toString))).toDF() + testData.registerTempTable("testData") + + val testData2 = sparkContext.parallelize( +(11 to 20).map(i => TestData(i, i.toString))).toDF() + testData2.registerTempTable("testData2") + + val testData3 = sparkContext.parallelize( +(21 to 30).map(i => TestData(i, i.toString))).toDF() + testData3.registerTempTable("testData3") + + val testData4 = sparkContext.parallelize( +(31 to 40).map(i => TestData(i, i.toString))).toDF() + testData4.registerTempTable("testData4") + + val tmpDir = Files.createTempDir() + // create the table for test + sql(s"CREATE TABLE table_with_partition(key int,value string) " + +s"PARTITIONED by (ds string, ds2 string) location '${tmpDir.toURI.toString}' ") + sql("INSERT OVERWRITE TABLE table_with_partition partition (ds='1', ds2='d1') " + +"SELECT key,value FROM testData") + sql("INSERT OVERWRITE TABLE table_with_partition partition (ds='2', ds2='d1') " + +"SELECT key,value FROM testData2") + sql("INSERT OVERWRITE TABLE table_with_partition partition (ds='3', ds2='d3') " + +"SELECT key,value FROM testData3") + sql("INSERT OVERWRITE TABLE table_with_partition partition (ds='4', ds2='d4') " + +"SELECT key,value FROM testData4") + + checkAnswer(sql("select key,value from table_with_partition"), +testData.collect ++ testData2.collect ++ testData3.collect ++ testData4.collect) + + checkAnswer( +sql( + """select key,value from table_with_partition +| where (ds='4' and key=38) or (ds='3' and key=22)""".stripMargin), + Row(38, "38") :: Row(22, "22") :: Nil) + + checkAnswer( +sql( + """select key,value from table_with_partition +| where (key<40 and key>38) or (ds='3' and key=22)""".stripMargin), +Row(39, "39") :: Row(22, "22") :: Nil) + + sql("DROP TABLE table_with_partition") + sql("DROP TABLE createAndInsertTest") --- End diff -- Not really sure why we should drop "createAndInsertTest", I can find it anywhere. Maybe those temp tables named "testData*" are the ones should be dropped. ^_^ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66563744 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -92,6 +92,36 @@ object PhysicalOperation extends PredicateHelper { .map(Alias(_, a.name)(a.exprId, a.qualifier, isGenerated = a.isGenerated)).getOrElse(a) } } + + /** + * Drop the non-partition key expression in the disjunctions, to optimize the partition pruning. --- End diff -- "Drop the non-partition key expression in the disjunctions". Should it be "conjunctions"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13522: [SPARK-14321][SQL] Reduce date format cost and st...
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13522#discussion_r65899067 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -435,20 +437,23 @@ abstract class UnixTime extends BinaryExpression with ExpectsInputTypes { case StringType if right.foldable => val sdf = classOf[SimpleDateFormat].getName val fString = if (constFormat == null) null else constFormat.toString -val formatter = ctx.freshName("formatter") if (fString == null) { ev.copy(code = s""" boolean ${ev.isNull} = true; ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};""") } else { + val formatter = ctx.freshName("formatter") + ctx.addMutableState(sdf, formatter, s"""$formatter = null;""") --- End diff -- Not very familiar with codegen, but I wonder if we can add the instantiation here and avoid the null checking below. ctx.addMutableState(sdf, formatter, s"""$formatter = new $sdf("$fString");""") --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13522: [SPARK-14321][SQL] Reduce date format cost and st...
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13522#discussion_r65898385 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -554,14 +561,19 @@ case class FromUnixTime(sec: Expression, format: Expression) boolean ${ev.isNull} = true; ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};""") } else { +val sdfTerm = ctx.freshName("formatter") --- End diff -- This is trivial but why use a different variable name here from the above one? (which is called "formatter") --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13486#discussion_r65799660 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -350,6 +350,10 @@ private[sql] object PartitioningUtils { case _ => throw new AnalysisException(s"Cannot use ${field.dataType} for partition column") } } + +if (partitionColumns.size == schema.fields.size) { + throw new AnalysisException(s"Cannot use all columns for partition columns") +} } --- End diff -- Yeah, I think it's better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13486#discussion_r65799422 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -350,6 +350,10 @@ private[sql] object PartitioningUtils { case _ => throw new AnalysisException(s"Cannot use ${field.dataType} for partition column") } } + +if (partitionColumns.size == schema.fields.size) { + throw new AnalysisException(s"Cannot use all columns for partition columns") +} } --- End diff -- One little concern. If it is added here, should the method name be changed? After all it will do more than validating data types after the change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...
Github user wangyang1992 commented on the pull request: https://github.com/apache/spark/pull/13177#issuecomment-221457800 Thanks @rxin . Added it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13177#discussion_r64335588 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -480,11 +480,21 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { try { Option(hive.getFunction(db, name)).map(fromHiveFunction) } catch { - case CausedBy(ex: NoSuchObjectException) if ex.getMessage.contains(name) => + case e: Throwable if isCausedBy(e, s"$name does not exist") => --- End diff -- @andrewor14 thanks. Changed to NonFatal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13177#discussion_r64334408 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -480,11 +480,21 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { try { Option(hive.getFunction(db, name)).map(fromHiveFunction) } catch { - case CausedBy(ex: NoSuchObjectException) if ex.getMessage.contains(name) => + case e: Throwable if isCausedBy(e, s"$name does not exist") => --- End diff -- @andrewor14 will this work? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...
Github user wangyang1992 commented on the pull request: https://github.com/apache/spark/pull/13177#issuecomment-221147177 Hi @andrewor14, I have checked out the CausedBy source code, I think it will return the root cause of the Exception being thrown not the first Exception. I copied the CausedBy source code and created a notebook. (https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/7973071962862063/390461470296902/58107563000366/latest.html) Would you please go over it sometime? If it is the situation you are worried about, I think we can catch it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...
Github user wangyang1992 commented on the pull request: https://github.com/apache/spark/pull/13177#issuecomment-220880826 Hi @andrewor14, sorry to bother you, but does this pr need to be further refined, or it is ready to merge? Could you please give me some instructions? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date
Github user wangyang1992 commented on the pull request: https://github.com/apache/spark/pull/13169#issuecomment-220834946 @cloud-fan Failed on some unrelated cases too, can you help me retest it again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13169#discussion_r64150440 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala --- @@ -353,6 +353,20 @@ class DateTimeUtilsSuite extends SparkFunSuite { c.getTimeInMillis * 1000 + 123456) } + test("SPARK-15379: special invalid date string") { +// Test stringToDate +assert(stringToDate( + UTF8String.fromString("2015-02-29 00:00:00")).isEmpty) --- End diff -- Added tests against date strings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date
Github user wangyang1992 commented on the pull request: https://github.com/apache/spark/pull/13169#issuecomment-220556056 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date
Github user wangyang1992 commented on the pull request: https://github.com/apache/spark/pull/13169#issuecomment-220549615 seems like a irrelevant fail. retest it please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date
Github user wangyang1992 commented on the pull request: https://github.com/apache/spark/pull/13169#issuecomment-220531075 Fixed scala style, retest it please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date
Github user wangyang1992 commented on the pull request: https://github.com/apache/spark/pull/13169#issuecomment-220523060 Addressed your comments. @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date
Github user wangyang1992 commented on the pull request: https://github.com/apache/spark/pull/13169#issuecomment-220512727 @cloud-fan Could you please help me look at this some time? A simple fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...
Github user wangyang1992 commented on the pull request: https://github.com/apache/spark/pull/13177#issuecomment-220501798 @andrewor14 Thanks :-). Do I still need to modify the code? Frankly, I don't really understand your comment above. ("this won't actually work because it'll find the first exception it sees and tries to match the message. You'll need to do this recursively and match all the messages in the exception stack") --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13177#discussion_r63815687 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -480,7 +480,7 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { try { Option(hive.getFunction(db, name)).map(fromHiveFunction) } catch { - case CausedBy(ex: NoSuchObjectException) if ex.getMessage.contains(name) => + case CausedBy(ex: Exception) if ex.getMessage.contains(s"$name does not exist") => --- End diff -- The objective here is not to catch all the exceptions but the ones caused by the function not existing. In my case, this exception is "org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:NoSuchObjectException(message:Function default.func does not exist))" whose root cause is MetaException, but it may vary in different situations (not really sure it varies, just conjecture based on previous code. See pr #12198 and #12853). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13169#discussion_r63814998 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -58,6 +58,7 @@ object DateTimeUtils { final val YearZero = -17999 final val toYearZero = to2001 + 7304850 final val TimeZoneGMT = TimeZone.getTimeZone("GMT") + final val MonthOf31Days = Set(1,3,5,7,8,10,12) --- End diff -- Indentation fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13177#discussion_r63773414 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -480,7 +480,7 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { try { Option(hive.getFunction(db, name)).map(fromHiveFunction) } catch { - case CausedBy(ex: NoSuchObjectException) if ex.getMessage.contains(name) => + case CausedBy(ex: Exception) if ex.getMessage.contains(s"$name does not exist") => --- End diff -- @andrewor14 Maybe it is safer this way. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...
GitHub user wangyang1992 opened a pull request: https://github.com/apache/spark/pull/13177 [SPARK-15388][SQL] Fix spark sql CREATE FUNCTION using hive 1.2.1 ## What changes were proposed in this pull request? spark.sql("CREATE FUNCTION myfunc AS 'com.haizhi.bdp.udf.UDFGetGeoCode'") throws "org.apache.hadoop.hive.ql.metadata.HiveException:MetaException(message:NoSuchObjectException(message:Function default.myfunc does not exist))" using hive 1.2.1. I think it is introduced by pr #12853. Fixing it by catching Exception (not NoSuchObjectException) and string matching. ## How was this patch tested? added a unit test and also tested it manually You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyang1992/spark fixCreateFunc2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13177.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13177 commit 08435b91b07a9f9aebde493aeec5725e28756ea7 Author: wangyang <wangy...@haizhi.com> Date: 2016-05-18T19:56:19Z fix create table with hive 1.2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14414][SQL] Make DDL exceptions more co...
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/12853#discussion_r63763111 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -616,7 +619,8 @@ private[hive] class HiveClientImpl( try { Option(client.getFunction(db, name)).map(fromHiveFunction) } catch { - case he: HiveException => None + case CausedBy(ex: NoSuchObjectException) if ex.getMessage.contains(name) => --- End diff -- In my case, the Exception thrown is "org.apache.hadoop.hive.ql.metadata.HiveException:MetaException(message:NoSuchObjectException(message:Function default.myfunc does not exist))", but it turns out that the root case of this exception is MetaException whose message is "NoSuchObjectException(message:Function default.myfunc does not exist))", thus the exception is not caught. (I run into this problem when I use "CREATE FUNCTION" using spark sql with hive) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date
GitHub user wangyang1992 opened a pull request: https://github.com/apache/spark/pull/13169 [SPARK-15379][SQL] check special invalid date ## What changes were proposed in this pull request? When invalid date string like "2015-02-29 00:00:00" are cast as date or timestamp using spark sql, it used to not return null but another valid date (2015-03-01 in this case). In this pr, invalid date string like "2016-02-29" and "2016-04-31" are returned as null when cast as date or timestamp. ## How was this patch tested? Unit tests are added. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyang1992/spark invalid_date Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13169.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13169 commit ef73d79bacc2eab8cbead7aa8991b4ec7de3b862 Author: wangyang <wangy...@haizhi.com> Date: 2016-05-18T10:04:14Z check special invalid date --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13934][SQL] fixed table identifier
Github user wangyang1992 commented on the pull request: https://github.com/apache/spark/pull/11929#issuecomment-200703478 In my case, my application processes lots of auto-generated table identifiers. Some of them use backticks and some of them do not. If we upgrade to 1.6.1 without fixing this issue, the existing code will break and we have to check whether table identifier using backticks all over the place(If the identifier already using backticks, we cannot add it again.). That means changing a lot of code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13934][SQL] fixed table identifier
Github user wangyang1992 commented on the pull request: https://github.com/apache/spark/pull/11929#issuecomment-200694207 BTW, I cannot reproduce this problem in master. I pushed this pr in case there will be another release in this branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13934][SQL] fixed table identifier
GitHub user wangyang1992 opened a pull request: https://github.com/apache/spark/pull/11929 [SPARK-13934][SQL] fixed table identifier ## What changes were proposed in this pull request? Table identifier that starts in a form of scientific notation (like 1e34) will throw an exception. val tableName = "1e34abcd" hc.sql("select 123").registerTempTable(tableName) hc.dropTempTable(tableName) The last line will throw a RuntimeException.(java.lang.RuntimeException: [1.1] failure: identifier expected) Fix this by changing the scientific notation parser. If a scientific notation is followed by one or more identifier char, then don't see it as a valid token. ## How was this patch tested? Unit test is added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyang1992/spark branch-1.6 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11929.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11929 commit 81287d31648b229bd3e617ef9ebce985fb54dca0 Author: wangyang <wangy...@haizhi.com> Date: 2016-03-24T04:30:27Z fixed table identifier --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13934][SQL] Fixed table name parsing
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/11762#discussion_r56349547 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/CatalystQlSuite.scala --- @@ -171,6 +171,7 @@ class CatalystQlSuite extends PlanTest { test("table identifier") { assert(TableIdentifier("q") === parser.parseTableIdentifier("q")) assert(TableIdentifier("q", Some("d")) === parser.parseTableIdentifier("d.q")) +assert(TableIdentifier("104e4d676bac4d9aa3856f00b5b9f51c") === parser.parseTableIdentifier("104e4d676bac4d9aa3856f00b5b9f51c")) --- End diff -- Yeah, I cannot reproduce this problem in master. I'm closing this pr. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13934][SQL] Fixed table name parsing
GitHub user wangyang1992 opened a pull request: https://github.com/apache/spark/pull/11762 [SPARK-13934][SQL] Fixed table name parsing ## What changes were proposed in this pull request? val tableName = "1e34abcd" hc.sql("select 123").registerTempTable(tableName) hc.dropTempTable(tableName) The last line will throw a RuntimeException.(java.lang.RuntimeException: [1.1] failure: identifier expected) Fix this by changing the scientific notation parser. If a scientific notation is followed by one or more identifier char, then don't see it as a valid token. ## How was this patch tested? unit test is added You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyang1992/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11762.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11762 commit b4ea5b5025208acfa55d8cb0f57a2d36f4391653 Author: wangyang <wangy...@haizhi.com> Date: 2016-03-16T12:54:38Z Fixed table name parsing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13934][SQL] Fixed table name parsing
Github user wangyang1992 closed the pull request at: https://github.com/apache/spark/pull/11762 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...
GitHub user wangyang1992 opened a pull request: https://github.com/apache/spark/pull/10994 [SPARK-13100] [SQL] improving the performance of stringToDate method in DateTimeUtils.scala Using an instance variable to hold an GMT TimeZone object instead of instantiate it every time. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyang1992/spark datetimeUtil Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10994.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10994 commit 19defc9c83da6206288c7ee70ce97f2e08603f72 Author: wangyang <wangy...@haizhi.com> Date: 2016-01-30T08:33:40Z improving the performance of stringToDate method in DateTimeUtils.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13100] [SQL] improving the performance ...
Github user wangyang1992 commented on the pull request: https://github.com/apache/spark/pull/10994#issuecomment-177151828 @srowen No, just that one in this file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org