[GitHub] spark pull request: [SPARK-14883][DOCS] Fix wrong R examples and m...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12649#issuecomment-213907830 Hi, @davies , @shivaram, @felixcheung . Could you review this PR when you have some time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14664][SQL] Fix DecimalAggregates optim...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12421#issuecomment-213907928 Rebased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15020][SQL] GROUP-BY should support Ali...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/12794 [SPARK-15020][SQL] GROUP-BY should support Aliases ## What changes were proposed in this pull request? `GROUP-BY` clauses raise **AnalysisException** for aliases while `ORDER-BY` clauses support them correctly. This PR aims to support aliases in `GROUP-BY` clause. This is one of frequently-used syntax to avoid unnecessary repetition. ``` scala> sql("select 1 a group by a").head org.apache.spark.sql.AnalysisException: cannot resolve '`a`' given input columns: []; line 1 pos 20 scala> sql("select 1 a order by a").head res1: org.apache.spark.sql.Row = [1] ``` ## How was this patch tested? Pass the Jenkins tests (including a new testcase) You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-15020 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12794.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12794 commit 71249cfeb9e798e41d8ef0f7423b48685a1774e7 Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-04-29T23:17:05Z [SPARK-15020][SQL] GROUP-BY should support Aliases ``` sql("select a x from values 1 T(a) group by x").explain org.apache.spark.sql.AnalysisException: cannot resolve '`x`' given input columns: [a]; line 1 pos 39 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15020][SQL] GROUP-BY should support Ali...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12794#issuecomment-215910465 Oh, Spark prefers `ordinal`. Thank you again, @gatorsmile . I'll close this PR and JIRA together now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15020][SQL] GROUP-BY should support Ali...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/12794 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15020][SQL] GROUP-BY should support Ali...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12794#issuecomment-215910061 Opps. Thank you for notifying me. I'll. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-215940750 Hi, @cloud-fan . Could you review this PR again when you have some time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLE] Fix SQL Python example.
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12809#issuecomment-215943715 I thought this is a bug fix, but I think I need to update to use `SparkSession`, too. I will add a commit very soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLE] Fix SQL Python example.
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/12809 [SPARK-15031][EXAMPLE] Fix SQL Python example. ## What changes were proposed in this pull request? Currently, Python SQL example, `sql.py`, fails due to the following two lines. This issue fixes them by the following fix. ``` -people = sqlContext.jsonFile(path) +people = sqlContext.read.json(path) ... -people.registerAsTable("people") +people.registerTempTable("people") ``` ## How was this patch tested? Run `bin/spark-submit examples/src/main/python/sql.py`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-15031 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12809.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12809 commit f2087fdbd4ebd7cbb94c5f866f21011a1e4d2212 Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-04-30T07:16:01Z [SPARK-15031][EXAMPLE] Fix SQL Python example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLE] Fix SQL Python example.
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12809#issuecomment-215945687 As you know, for the example, I need to verify the result manually. I'll proceed testsuite one first because it's automatically verified while this one is marked as [WIP]. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][EXAMPLE] Use SparkSession instead of S...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12808#issuecomment-215943551 Oh, sure! I will take a look at SQL testsuites and make a single PR for that. Thank you, @rxin . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLE] Fix SQL Python example.
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12809#issuecomment-215943997 I mean `SparkSession` updating here since I didn't run all examples. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][EXAMPLE] Use SparkSession instead of S...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/12808 [MINOR][EXAMPLE] Use SparkSession instead of SQLContext in RDDRelation.scala ## What changes were proposed in this pull request? Now, `SQLContext` is used for backward-compatibility, we had better use `SparkSession` in Spark 2.0 examples. ## How was this patch tested? It's just example change. After building, run `bin/run-example org.apache.spark.examples.sql.RDDRelation`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark rddrelation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12808.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12808 commit 7ea07b5e73bc17c78e175d0a9f603d301259005d Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-04-30T06:17:41Z [MINOR][EXAMPLE] Use SparkSession instead of SQLContext in RDDRelation.scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLE] Fix SQL Python example.
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12809#issuecomment-215943886 Oh, sure. May I proceed this PR for all examples and all testsuites together? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLE] Fix SQL Python example.
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12809#issuecomment-215943910 If you don't mind, I prefer to do as a single one. I can update the JIRA and PR description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLE] Fix SQL Python example.
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12809#issuecomment-215945517 I see. No problem. Then, I'll use this one for all the example changes (including some fix like here). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLE] Use SparkSession in Sca...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12809#issuecomment-215996680 Now, I addressed all comments so far. Thank you for fast reviews, @rxin . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLE] Use SparkSession in Sca...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12809#issuecomment-215992910 Hi, @rxin . For this issue, I'll add new constructor for `SparkSession` and proceed to Java examples. ``` def this(sparkContext: JavaSparkContext) = this(sparkContext.sc) ``` Please let me know if there is some problem for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLE] Use SparkSession in Sca...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12809#issuecomment-215995243 I updated Java examples with SparkSession(JavaSparkContext). For `SparkSession(SparkConf)`, I'll handle soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15031][EXAMPLE] Use SparkSession in Sca...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12809#issuecomment-216012424 Rebased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Improve EliminateSorts opti...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-215616875 Thank you for review, @cloud-fan ! Do you mean removing aliases by replacing the base expression(?) by using `transformUp`? Maybe, except the top most aliases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Improve EliminateSorts opti...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-215620369 If that's just about how to handle `Sort(_, Project(_,_))` expressions in `EliminateSorts`, I can easily modify this PR according to your advice. After moving up the foldables, and the existing `case` statement removes them eventually. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Improve EliminateSorts opti...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-215620961 Right. Thank you so much for enriching ideas! I'll update this PR with `FoldablePropagation`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Improve EliminateSorts opti...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-215619413 Actually, `Sort` is dead end, we can not propagate up anymore. So, in that case, removing looks more efficient. Do you mean more generalized `FoldablePropagation` like `NullPropagation` by 'not only Sort'? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Improve EliminateSorts opti...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-215620522 Oh, I got. Thanks. I will try to generalize. * Sort(_, Project(_)) * Project(_, Project(...)) And so on. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14907][MLLIB] Use repartition in GLMReg...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12676#issuecomment-214822460 Hi, @mengxr . Could you review this too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14830][SQL] Add RemoveRepetitionFromGro...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12590#issuecomment-214831287 Hi, @marmbrus . Could you review this PR when you have some time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14907][MLLIB] Use repartition in GLMReg...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12676#issuecomment-214605818 Hi, @jkbradley . Could you review this PR when you have some time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][BUILD] Enable RAT checking on `LZ4Bloc...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12677#issuecomment-214606445 Hi, @davies and @srowen . This PR just removes `LZ4BlockInputStream.java` from `dev/.rat-exclude` and passed the RAT test. Could you merge this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14907][MLLIB] Use repartition in GLMReg...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12676#issuecomment-214842843 In fact, I didn't try to change that if it's just a style problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14907][MLLIB] Use repartition in GLMReg...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12676#issuecomment-214842611 If you think so, it's okay, @jkbradley . But, if you don't mind, could you remove those TODO by yourself. Do you have any reason to maintain that? TODO comments always mislead community developer like me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] sparkSession can't be private.
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12684#issuecomment-214612978 It's a clean build after `git clean -fdx`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] sparkSession can't be private.
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12684#issuecomment-214612753 Ur, @rxin . Could you check that again? ``` [error] /Users/dongjoon/spark/sql/hivecontext-compatibility/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala:50: super may not be used on value sparkSession [error] new HiveContext(super.sparkSession.newSession(), isRootContext = false) [error] ^ [error] /Users/dongjoon/spark/sql/hivecontext-compatibility/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala:54: super may not be used on value sparkSession [error] super.sparkSession.sessionState.asInstanceOf[HiveSessionState] [error] ^ [error] /Users/dongjoon/spark/sql/hivecontext-compatibility/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala:58: super may not be used on value sparkSession [error] super.sparkSession.sharedState.asInstanceOf[HiveSharedState] [error] ^ [error] three errors found ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] sparkSession can't be private.
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12684#issuecomment-214615564 Thank you! :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14664][SQL] Implement DecimalAggregates...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12421#issuecomment-214616662 Rebased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] sparkSession can't be private.
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12684#issuecomment-214608530 cc @rxin . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] sparkSession can't be private.
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/12684 [HOTFIX][SQL] sparkSession can't be private. ## What changes were proposed in this pull request? This fixes the following errors. ``` -@transient private val sparkSession: SparkSession, +@transient override val sparkSession: SparkSession, ``` ## How was this patch tested? Pass the Jenkins build. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark hotfix_sparksession Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12684.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12684 commit 1c8b2469f0d1ca4304ccd42b0586f52b3ea1e376 Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-04-26T04:18:25Z sparkSession can't be private. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] sparkSession can't be private.
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/12684 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] sparkSession can't be private.
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12684#issuecomment-214609078 Sure! Thank for quick fix. I found that when I rebase my PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14796][SQL] Add spark.sql.optimizer.inS...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12562#issuecomment-213177718 Hi, @rxin and @marmbrus . How do you think about the updated PR? It's just first update. If there is something to do more, please let me know. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14830][SQL] Add RemoveRepetitionFromGro...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/12590 [SPARK-14830][SQL] Add RemoveRepetitionFromGroupExpressions optimizer. ## What changes were proposed in this pull request? This PR aims to optimize GroupExpressions by removing repeating expressions. `RemoveRepetitionFromGroupExpressions` is added. **Before** ```scala scala> sql("select a from (select explode(array(1,2)) a) T group by a, a, a").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[a#5,a#5,a#5], functions=[], output=[a#5]) : +- INPUT +- Exchange hashpartitioning(a#5, a#5, a#5, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[a#5,a#5,a#5], functions=[], output=[a#5,a#5,a#5]) : +- INPUT +- Generate explode([1,2]), false, false, [a#5] +- Scan OneRowRelation[] ``` **After** ```scala scala> sql("select a from (select explode(array(1,2)) a) T group by a, a, a").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[a#5], functions=[], output=[a#5]) : +- INPUT +- Exchange hashpartitioning(a#5, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[a#5], functions=[], output=[a#5]) : +- INPUT +- Generate explode([1,2]), false, false, [a#5] +- Scan OneRowRelation[] ``` ## How was this patch tested? Pass the Jenkins tests (with a new testcase) You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-14830 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12590.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12590 commit 75b8f7353454cbe6aa8dafa070f51c90d0f430e5 Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-04-22T00:08:32Z [SPARK-14830][SQL] Add RemoveRepetitionFromGroupExpressions optimizer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14883][DOCS] Fix wrong R examples and m...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/12649 [SPARK-14883][DOCS] Fix wrong R examples and make them up-to-date ## What changes were proposed in this pull request? This issue aims to fix some errors in R examples and make them up-to-date in docs and example modules. - Fix the wrong usage of map. We need to use `lapply` if needed. However, the usage of `lapply` also needs to be reviewed since it's private. ``` -teenNames <- map(teenagers, function(p) { paste("Name:", p$name)}) +teenNames <- SparkR:::lapply(teenagers, function(p) { paste("Name:", p$name) }) ``` - Fix the wrong example in Section `Generic Load/Save Functions` of `docs/sql-programming-guide.md` for consistency - Fix datatypes in `sparkr.md`. - Update a data result in `sparkr.md`. - Replace deprecated functions to remove warnings: jsonFile -> read.json, parquetFile -> read.parquet - Use up-to-date R-like functions: loadDF -> read.df, saveDF -> write.df, saveAsParquetFile -> write.parquet - Replace `SparkR DataFrame` with `SparkDataFrame` in `dataframe.R` and `data-manipulation.R`. - Other minor syntax fixes and a typo. ## How was this patch tested? Manual. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-14883 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12649.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12649 commit 5d6d45e07c15d17c5d1972733962013a6fcd228c Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-04-24T06:43:45Z [SPARK-14883][DOCS] Fix wrong R examples and make them up-to-date --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14868][BUILD] Enable NewLineAtEofChecke...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/12632#discussion_r60839935 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java --- @@ -82,37 +81,37 @@ public static long computePrefix(double value) { // public static final class UnsignedPrefixComparator extends RadixSortSupport { -@Override public final boolean sortDescending() { return false; } -@Override public final boolean sortSigned() { return false; } +@Override public boolean sortDescending() { return false; } +@Override public boolean sortSigned() { return false; } @Override -public final int compare(long aPrefix, long bPrefix) { +public int compare(long aPrefix, long bPrefix) { return UnsignedLongs.compare(aPrefix, bPrefix); } } public static final class UnsignedPrefixComparatorDesc extends RadixSortSupport { -@Override public final boolean sortDescending() { return true; } -@Override public final boolean sortSigned() { return false; } +@Override public boolean sortDescending() { return true; } +@Override public boolean sortSigned() { return false; } @Override -public final int compare(long bPrefix, long aPrefix) { +public int compare(long bPrefix, long aPrefix) { --- End diff -- Oh, it's definitely final. It's just `RedundantModifier` error since the class `SignedPrefixComparator` is already `final`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14868][BUILD] Enable NewLineAtEofChecke...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12632#issuecomment-213906023 Hi, @rxin . Thank you for review. FYI, here is the result of `dev/lint-java` of current master branch. ```bash spark:master$ dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks failed at following occurrences: [ERROR] src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java:[259] (sizes) LineLength: Line is longer than 100 characters (found 103). [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[25,8] (imports) UnusedImports: Unused import - org.apache.spark.util.Utils. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[72,17] (modifier) ModifierOrder: 'abstract' modifier out of order with the JLS suggestions. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[85,22] (modifier) RedundantModifier: Redundant 'final' modifier. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[86,22] (modifier) RedundantModifier: Redundant 'final' modifier. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[88,12] (modifier) RedundantModifier: Redundant 'final' modifier. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[94,22] (modifier) RedundantModifier: Redundant 'final' modifier. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[95,22] (modifier) RedundantModifier: Redundant 'final' modifier. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[97,12] (modifier) RedundantModifier: Redundant 'final' modifier. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[103,22] (modifier) RedundantModifier: Redundant 'final' modifier. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[104,22] (modifier) RedundantModifier: Redundant 'final' modifier. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[106,12] (modifier) RedundantModifier: Redundant 'final' modifier. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[112,22] (modifier) RedundantModifier: Redundant 'final' modifier. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[113,22] (modifier) RedundantModifier: Redundant 'final' modifier. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java:[115,12] (modifier) RedundantModifier: Redundant 'final' modifier. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java:[19] (regexp) RegexpSingleline: No trailing whitespace allowed. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/RadixSort.java:[230] (regexp) RegexpSingleline: No trailing whitespace allowed. [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[215] (sizes) LineLength: Line is longer than 100 characters (found 103). [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[41,8] (imports) UnusedImports: Unused import - org.apache.hadoop.fs.FileSystem. [ERROR] src/test/java/org/apache/spark/ml/classification/JavaRandomForestClassifierSuite.java:[84,26] (misc) ArrayTypeStyle: Array brackets at illegal position. [ERROR] src/test/java/org/apache/spark/ml/classification/JavaRandomForestClassifierSuite.java:[88,29] (misc) ArrayTypeStyle: Array brackets at illegal position. [ERROR] src/test/java/org/apache/spark/ml/classification/JavaRandomForestClassifierSuite.java:[92,29] (misc) ArrayTypeStyle: Array brackets at illegal position. [ERROR] src/test/java/org/apache/spark/ml/regression/JavaRandomForestRegressorSuite.java:[84,26] (misc) ArrayTypeStyle: Array brackets at illegal position. [ERROR] src/test/java/org/apache/spark/ml/regression/JavaRandomForestRegressorSuite.java:[88,29] (misc) ArrayTypeStyle: Array brackets at illegal position. [ERROR] src/test/java/org/apache/spark/ml/regression/JavaRandomForestRegressorSuite.java:[92,29] (misc) ArrayTypeStyle: Array brackets at illegal position. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional
[GitHub] spark pull request: [SPARK-14868][BUILD] Enable NewLineAtEofChecke...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/12632#discussion_r60839977 --- Diff: mllib/src/test/java/org/apache/spark/ml/classification/JavaRandomForestClassifierSuite.java --- @@ -81,15 +81,15 @@ public void runDT() { for (String featureSubsetStrategy: RandomForestClassifier.supportedFeatureSubsetStrategies()) { rf.setFeatureSubsetStrategy(featureSubsetStrategy); } -String realStrategies[] = {".1", ".10", "0.10", "0.1", "0.9", "1.0"}; +String[] realStrategies = {".1", ".10", "0.10", "0.1", "0.9", "1.0"}; --- End diff -- Sure. That's `ArrayTypeStyle` rule. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14868][BUILD] Enable NewLineAtEofChecke...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/12632#discussion_r60840026 --- Diff: mllib/src/test/java/org/apache/spark/ml/classification/JavaRandomForestClassifierSuite.java --- @@ -81,15 +81,15 @@ public void runDT() { for (String featureSubsetStrategy: RandomForestClassifier.supportedFeatureSubsetStrategies()) { rf.setFeatureSubsetStrategy(featureSubsetStrategy); } -String realStrategies[] = {".1", ".10", "0.10", "0.1", "0.9", "1.0"}; +String[] realStrategies = {".1", ".10", "0.10", "0.1", "0.9", "1.0"}; --- End diff -- According to @andrewor14 , java linter is turned off intentionally due to the overhead of maven. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14868][BUILD] Enable NewLineAtEofChecke...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/12632#discussion_r60840040 --- Diff: mllib/src/test/java/org/apache/spark/ml/classification/JavaRandomForestClassifierSuite.java --- @@ -81,15 +81,15 @@ public void runDT() { for (String featureSubsetStrategy: RandomForestClassifier.supportedFeatureSubsetStrategies()) { rf.setFeatureSubsetStrategy(featureSubsetStrategy); } -String realStrategies[] = {".1", ".10", "0.10", "0.1", "0.9", "1.0"}; +String[] realStrategies = {".1", ".10", "0.10", "0.1", "0.9", "1.0"}; --- End diff -- I hope we can bring it back someday if possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14868][BUILD] Enable NewLineAtEofChecke...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12632#issuecomment-213906191 Sure! It's just one line change. May I turn it one right now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14868][BUILD] Enable NewLineAtEofChecke...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12632#issuecomment-213907254 Interesting. @rxin . Jenkins is trying to use Maven 3.1.1 due to the mismatch between `--force` option and `lint-java`. ``` Using `mvn` from path: /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.1.1/bin/mvn Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 Checkstyle checks failed at following occurrences: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (default-cli) on project spark-parent_2.11: Unable to parse configuration of mojo org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check for parameter sourceDirectories: Cannot assign configuration entry 'sourceDirectories' with value '/home/jenkins/workspace/SparkPullRequestBuilder/src/main/java,/home/jenkins/workspace/SparkPullRequestBuilder/src/main/scala' of type java.lang.String to property of type java.util.List -> [Help 1] ``` Actually, my PR #12631 aims to handle that bug. Could you review that too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14868][BUILD] Enable NewLineAtEofChecke...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12632#issuecomment-213907583 I reverted the last commit about Jenkins Java Linter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14867][BUILD] Remove `--force` option i...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12631#issuecomment-213880918 Rebased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13192#issuecomment-220257517 Hi, @rxin . This is the first attempt according to your request. I removed some obsolete code in #11019 in order to pass the tests. Please let me know if there is something I missed mistakenly. cc @cloud-fan @nongli --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15058][MLLIB][TEST] Enable Java Decisio...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/12840 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15282][SQL] PushDownPredicate should no...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13087#discussion_r64002341 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1025,7 +1025,8 @@ object PushDownPredicate extends Rule[LogicalPlan] with PredicateHelper { // state and all the input rows processed before. In another word, the order of input rows // matters for non-deterministic expressions, while pushing down predicates changes the order. case filter @ Filter(condition, project @ Project(fields, grandChild)) - if fields.forall(_.deterministic) => +if fields.forall(_.deterministic) && + fields.forall(_.find(_.isInstanceOf[ScalaUDF]).isEmpty) => --- End diff -- Great! No problem. I will try to fix other testsuite correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13192#issuecomment-220532622 As @rxin told, what was really needed is removing `overlapping` comments. So, I rethink about that and revert the change on `Expression.gen` which removes the `code` field. It has its own values. Instead, I can achieve that goal simply by adding `CodeFormatter. stripOverlappingComments`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15282][SQL] PushDownPredicate should no...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13087#discussion_r63997574 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1025,7 +1025,8 @@ object PushDownPredicate extends Rule[LogicalPlan] with PredicateHelper { // state and all the input rows processed before. In another word, the order of input rows // matters for non-deterministic expressions, while pushing down predicates changes the order. case filter @ Filter(condition, project @ Project(fields, grandChild)) - if fields.forall(_.deterministic) => +if fields.forall(_.deterministic) && + fields.forall(_.find(_.isInstanceOf[ScalaUDF]).isEmpty) => --- End diff -- Thank you, @cloud-fan , again! Yep. Right. Exactly, I really wanted to do that. So, I made my first initial commit for this PR. But, you can see the result in the above. * the first initial commit: https://github.com/apache/spark/pull/13087/commits/85fa0406280802259cd141d8ad9fdeb9b9173405 * Jenkins result: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58531/consoleFull * My decision: > There are several cases which assumes UDF is deterministic. It would be a big change to user. I'll revert the change on ScalaUDF, and update this PR to change optimizer not to duplicate the UDF expression. I still think that is a correct solution. I mean I totally agree with you. But, as you see, it needs to change other testsuites, so I thought I need commiters' decision to do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15057][GRAPHX] Remove stale TODO commen...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12839#issuecomment-220530065 Thank you, @rxin . :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15282][SQL] Make ScalaUDF nondeterminis...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-220659068 Thank you for review, @marmbrus and @markhamstra ! Actually, it's huge change. Although I'm not aware of the real background, the reported case can be handled by just preventing `PushDownPredicate` should not duplicate UDF function expressions. I think we can keep the common subexpression elimination without any change. Anyway, I will revert the current investigation into my second commit back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13192#issuecomment-220667461 Hi, @davies . It's ready for review, again! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15282][SQL] PushDownPredicate should no...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-220662788 @marmbrus @markhamstra @thunterdb . Now, this code and description of this PR is rollbacked my second commit 7 days ago. For `common subexpression elimination`, I think it's okay if the some optimizer reduces the number of UDF calls. It's expectable. How do you think about this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13192#issuecomment-220682262 Ya. There were a huge change. I've saw the PR before, but I didn't consider that in this PR. My bad. Let me think how to solve the original goal with new master branch. Thank you, @davies . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15282][SQL] Make ScalaUDF nondeterminis...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-220547374 According to @cloud-fan 's advice, the goal of this PR is now making `ScalaUDF` as a non-deterministic expression. Although this is a correct fix, one noticeable drawback is that we cannot use ScalaUDF in GROUP BY. I updated the description of this PR, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15282][SQL] Make ScalaUDF nondeterminis...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-220563755 Hmm. @cloud-fan . There is bad news. **ALS.scala** uses UDF on aggregation. So, there are 7 failures on **ALSSuite.scala**. ```scala override def transform(dataset: Dataset[_]): DataFrame = { transformSchema(dataset.schema) // Register a UDF for DataFrame, and then // create a new column named map(predictionCol) by running the predict UDF. val predict = udf { (userFeatures: Seq[Float], itemFeatures: Seq[Float]) => if (userFeatures != null && itemFeatures != null) { blas.sdot(rank, userFeatures.toArray, 1, itemFeatures.toArray, 1) } else { Float.NaN } } dataset .join(userFactors, checkedCast(dataset($(userCol)).cast(DoubleType)) === userFactors("id"), "left") .join(itemFactors, checkedCast(dataset($(itemCol)).cast(DoubleType)) === itemFactors("id"), "left") .select(dataset("*"), predict(userFactors("features"), itemFactors("features")).as($(predictionCol))) } ``` According to the Jenkins test failure log, this is the last hurdle. However, it proves the usage of UDF on aggregation prevails. Spark users might depends on this risky feature much more. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13192#issuecomment-220547752 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15282][SQL] Make ScalaUDF nondeterminis...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-220572951 I updated `ALS` and `ALSSuite` just in order to pass the Jenkins for further discussion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15462][SQL][TEST] `unresolved === false...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13241 [SPARK-15462][SQL][TEST] `unresolved === false` is enough in testcases. ## What changes were proposed in this pull request? In only `catalyst` module, there exists 7 evaluation test cases on unresolved expressions. But, in real-world situation, those cases doesn't happen since they occurs exceptions before evaluations. ```scala scala> sql("select format_number(null, 3)") res0: org.apache.spark.sql.DataFrame = [format_number(CAST(NULL AS DOUBLE), 3): string] scala> sql("select format_number(cast(null as NULL), 3)") org.apache.spark.sql.catalyst.parser.ParseException: DataType null() is not supported.(line 1, pos 34) ``` This PR makes those testcases more realistic. ```scala -checkEvaluation(FormatNumber(Literal.create(null, NullType), Literal(3)), null) +assert(FormatNumber(Literal.create(null, NullType), Literal(3)).resolved === false) ``` Also, this PR also removes redundant `resolved` checking in `FoldablePropagation` optimizer. ## How was this patch tested? Pass the modified Jenkins tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-15462 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13241.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13241 commit 9374936b0cd1ecf1e080c530a3f7c691188a61fd Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-05-21T07:13:18Z [SPARK-15462][SQL][TEST] `unresolved === false` is enough in testcases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15462][SQL][TEST] `unresolved === false...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13241#issuecomment-220763620 cc @cloud-fan . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15282][SQL][DOCS] Add notes of the dete...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-221049934 Hi, @marmbrus . I replaced 'should' with 'must', and added the detail description for `functions.py`, `SQLContext.scala`, `SparkSession.scala` and `UserDefinedFunction.scala`, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15282][SQL][DOCS] Add notes of the dete...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13087#discussion_r64259325 --- Diff: python/pyspark/sql/functions.py --- @@ -1756,6 +1756,7 @@ def __call__(self, *cols): @since(1.3) def udf(f, returnType=StringType()): """Creates a :class:`Column` expression representing a user defined function (UDF). +Note that the user-defined functions should be deterministic. --- End diff -- Thank you. I see. I'll fix like that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15282][SQL][DOCS] Add notes of the dete...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-221093278 Hi, @marmbrus . Instead of creating new JIRA, I think we had better change the title of this PR into `[MINOR][SQL][DOC] ...`. Initially, I tried to handle @linbojin 's SPARK-15282, but now this PR does not. How do you think about that? Of course, @linbojin can add the detail description to that JIRA issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL][DOCS] Add notes of the determinis...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-221098550 Oops. I didn't change the title yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15512][CORE] repartition(0) should rais...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13282#issuecomment-221425046 Thank you for review, @rxin . I'll check them whether they need this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15512][CORE] repartition(0) should rais...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13282#issuecomment-221437321 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15512][CORE] repartition(0) should rais...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13282#issuecomment-221462008 Thank you, @rxin ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15512][CORE] repartition(0) should rais...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13282#issuecomment-221431064 Hi, @rxin . I added that to catalyst/DataSetSuite/DataFrameSuite, too. So far, I cannot find a proper place in Analyzer. If you mind, could you give me some pointer? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15512][CORE] repartition(0) should rais...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13282#issuecomment-221435072 It seems not necessary since this PR prevents at the level of `Repartition` ctor. ``` case class Repartition(numPartitions: Int, shuffle: Boolean, child: LogicalPlan) extends UnaryNode { + require(numPartitions > 0, s"Number of partitions ($numPartitions) must be positive.") override def output: Seq[Attribute] = child.output } ``` Please see the updated description of this PR, and let me know if I miss some cases you mentioned. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL][DOCS] Add notes of the determinis...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-221099176 I'm not sure what happen. I'll remove this PR information from @linbojin 's JIRA issue anyway. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL][DOCS] Add notes of the determinis...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-22101 Thank you so much! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL][DOCS] Add notes of the determinis...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13087#issuecomment-221101369 Thank you all for reviewing and helping this PR! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220134254 In addition, sorry for that I wrote a wrong example before `checkEvaluation(FormatNumber(Literal(4.asInstanceOf[Byte]), Literal(3)), "4.000")`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220128276 Hi, @cloud-fan . I found the root cause. There is an exceptional case for **Literal.create(null, NullType)**. ```scala scala> import org.apache.spark.sql.types._ scala> import org.apache.spark.sql.catalyst.expressions._ scala> FormatNumber(Literal.create(null, NullType), Literal(3)).resolved res0: Boolean = false // lazy val resolved: Boolean = childrenResolved && checkInputDataTypes().isSuccess scala> FormatNumber(Literal.create(null, NullType), Literal(3)).childrenResolved res1: Boolean = true scala> FormatNumber(Literal.create(null, NullType), Literal(3)).checkInputDataTypes().isSuccess res2: Boolean = false ``` Due to the **NullType**, `ExpectsInputTypes` returns false with the cause: `argument 1 requires numeric type, however, 'NULL' is of null type`. For **FormatNumber(Literal.create(null, IntegerType), Literal(3))**, its 'resolved' is true, and we have no problem. In short, there occurs `unresolved` situation due the type mismatches. It's inevitable and a real-world corner case. To keep `Optimizer` safe, we should have `resolved` checking. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15373][WEB UI] Spark UI should show con...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13158#issuecomment-220173646 Hi, @zsxwing . Finally, it passes the Jenkins test. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15373][WEB UI] Spark UI should show con...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13158#issuecomment-220175949 Oh, thank you, @srowen ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15058][MLLIB][TEST] Enable Java Decisio...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12840#issuecomment-220161025 Hi, @jkbradley , @mengxr , @MLnick , @rxin , @srowen . Such a long silence definitely means I did this in a wrong way, didn't it? There were already two PRs (this PR and #13100 ) about this. If then, I had better close this PR and JIRA explicitly. Sorry for touching this. If there is no comments until tonight, I will close my PR and JIRA as WONTFIX. Maybe, someday, one of ML part committers is going to finish this correctly. Thank you in any case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220212842 For the second suggestion, `the optimizer is not tested but skipped`, you mean skipping `FoldablePropagation` optimizer? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220216995 Thank you for understanding. I'll try to handle those test issues in another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220207591 No, there are more test suite failures. That is just one example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220217482 Oh, amazing. According to the last Jenkins results. The seven test failures in `catalyst` are all of them. ``` [info] *** 7 TESTS FAILED *** [error] Failed: Total 1656, Failed 7, Errors 0, Passed 1649, Ignored 1 [error] Failed tests: [error] org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite [error] org.apache.spark.sql.catalyst.expressions.CastSuite [error] (catalyst/test:test) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 222 s, completed May 18, 2016 8:11:07 PM ``` Anyway, I will handle them in another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220207782 Actually, I made two separate Jenkins run to show you the comparison. Those fails on MiMa errors. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220207983 May I rollback the last commit? Let's see the Jenkins result. I think It's worth of doing that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220210913 Should I touch those too? I agree with you that this situation will not occur in **real** testcases. Maybe, only `catalyst`-related problems? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220208494 Here is another test run (without resolved checking and the test case `checkEvaluation(FormatNumber(Literal.create(null, NullType), Literal(3)), null)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220210664 @cloud-fan . Here is the result of `catalyst` first. - **catalyst**: 7 failures - DateExpressionsSuite: 3 failures - CastSuite: 4 failures --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220211380 Okay, I will proceed with modifying `checkEvaluation`. Thank you for fast decision, @cloud-fan . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220212554 Oops. Analyzing by modifying `checkEvaluation` is not helpful in this case. For example, `CastSuite`, ``` { val ret = cast(array_notNull, ArrayType(BooleanType, containsNull = false)) assert(ret.resolved === false) checkEvaluation(ret, Seq(null, true, false)) } ``` The test case is designed to make 'unresolved' case and check the result intentionally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220213423 Hmm. @cloud-fan . What about just using `resolved` checking simply? IMHO, it provides just robustness. And, in fact, I'm reluctant to change testsuite when adding new feature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12719#issuecomment-220217246 I removed the last test commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13192#discussion_r63916714 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala --- @@ -124,6 +124,7 @@ object GenerateMutableProjection extends CodeGenerator[Seq[Expression], MutableP public java.lang.Object apply(java.lang.Object _i) { InternalRow ${ctx.INPUT_ROW} = (InternalRow) _i; + /*** project list: ${expressions.map(_.toCommentSafeString).mkString(", ")} */ --- End diff -- No problem. But, it is directly related with the intention of original @rxin 's PR, too. I think we need to make some consensus on how to improve that. I'll wait and collect other people's opinions until this afternoon, and do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13192#discussion_r63918627 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala --- @@ -124,6 +124,7 @@ object GenerateMutableProjection extends CodeGenerator[Seq[Expression], MutableP public java.lang.Object apply(java.lang.Object _i) { InternalRow ${ctx.INPUT_ROW} = (InternalRow) _i; + /*** project list: ${expressions.map(_.toCommentSafeString).mkString(", ")} */ --- End diff -- Thank you for fast decision, @rxin ! :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13192#discussion_r63915288 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala --- @@ -24,13 +24,13 @@ package org.apache.spark.sql.catalyst.expressions.codegen * Written by Matei Zaharia. */ object CodeFormatter { - def format(code: String): String = new CodeFormatter().addLines(code).result() + def format(code: String): String = new CodeFormatter().addLines(stripExtraNewLines(code)).result() --- End diff -- Thank you for review, @davies . Oh, I thought `CodeFormatter.format` is called before Janino and Guava loading cache, too. I'll make that consistent in this afternoon. If then, it'll be okay. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org