[GitHub] spark pull request #15914: delete temporary folder after insert hive table
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/15914 delete temporary folder after insert hive table ## What changes were proposed in this pull request? Modify the code of InsertIntoHiveTable.scala. To fix https://issues.apache.org/jira/browse/SPARK-14974 ## How was this patch tested? I think this patch can be tested manually. You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark SPARK-14974-20161117 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15914.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15914 commit cb08136733b4b4dc48e488e33525dcebb715a75f Author: baishuo Date: 2016-11-17T06:35:29Z delete temporary folder after insert hive table --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14262: [SPARK-14974][SQL]delete temporary folder after insert h...
Github user baishuo commented on the issue: https://github.com/apache/spark/pull/14262 close this and open the same one base on new master branch. https://github.com/apache/spark/pull/15914 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14262: [SPARK-14974][SQL]delete temporary folder after i...
Github user baishuo closed the pull request at: https://github.com/apache/spark/pull/14262 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5084][SQL]add if not exists after creat...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/3895#issuecomment-68840957 I had modify some code and do test locally --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4908][SQL][hotfix]narrow the scope of s...
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/4001 [SPARK-4908][SQL][hotfix]narrow the scope of synchronized for PR 3834 compared with https://github.com/apache/spark/pull/3834, this PR narrow the scope of synchronized You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark SPARK-4908-20141231 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4001.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4001 commit 4bfa3067f6d1494c770d49375498cf1b4adbaa45 Author: baishuo Date: 2015-01-12T07:06:14Z narrow the scope of synchronized for PR 3834 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4908][SQL]narrow the scope of synchroni...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/4001#issuecomment-69883507 Hi @liancheng and @marmbrus I had remove [hotfix] from the title. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4908][SQL]narrow the scope of synchroni...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/4001#issuecomment-69884566 Indeed, the code passed all the test when I do test locally, I had [hotfix] to title just because i want illustrate that this is not the final solution of [SPARK-4908] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4908][SQL]narrow the scope of synchroni...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/4001#issuecomment-70046499 Hi @marmbrus ,can this PR be merged? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5084][SQL]add if not exists after creat...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/3895#issuecomment-70046564 Hi @marmbrus ,can this PR be merged? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4908][SQL]narrow the scope of synchroni...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/4001#issuecomment-70778989 no problem,clolse it :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4908][SQL]narrow the scope of synchroni...
Github user baishuo closed the pull request at: https://github.com/apache/spark/pull/4001 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update BasicOperationsSuite.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1084#issuecomment-46786699 let me do a check --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update BasicOperationsSuite.scala
Github user baishuo closed the pull request at: https://github.com/apache/spark/pull/1084 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update BasicOperationsSuite.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1084#issuecomment-47611820 closed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/1272 Update SQLConf.scala use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1272.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1272 commit 0740f28b04a43ac739d6f45a7ffc6fa23fe7b96c Author: baishuo(ç½ç¡) Date: 2014-07-01T03:29:12Z Update SQLConf.scala use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1272#issuecomment-47621874 I add some synchronized please see if it is thread safeï¼and Jenkins should test this once more --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1272#issuecomment-47625269 thanks @aarondav ï¼had modified according to your commentï¼please help me to check if it is proper --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1272#issuecomment-47627343 Hi,@rxin ï¼had remove indent spacing on def set(props: Properties): Unit = { props.asScala.foreach { case (k, v) => this.settings.put(k, v) } } please help me to check if it is proper --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user baishuo commented on a diff in the pull request: https://github.com/apache/spark/pull/1272#discussion_r14392768 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -50,8 +50,7 @@ trait SQLConf { /** ** SQLConf functionality methods */ @transient - private val settings = java.util.Collections.synchronizedMap( -new java.util.HashMap[String, String]()) + private val settings = new java.util.concurrent.ConcurrentHashMap[String, String]() --- End diff -- undo to Collections.synchronizedMap --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1272#issuecomment-47634643 hi @rxin, how to modify is proper? use settings.sycnronize { ... } to ensure the thread safe? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1272#issuecomment-47865791 modify according @cloud-fan âs comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update SQLConf.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1272#issuecomment-48005584 oohï¼sorry about the compile errorï¼had change orElse to getOrElse. thank you @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update MultiInstanceRelation.scala
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/1312 Update MultiInstanceRelation.scala I think if multiAppearance is empty,we can return plan directly You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark test-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1312.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1312 commit b779773356ba475085ea425a1c8a23048b13fd4f Author: baishuo(ç½ç¡) Date: 2014-07-07T02:26:43Z Update MultiInstanceRelation.scala I think if multiAppearance is empty,we can return plan directly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL]Update MultiInstanceRelation.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1312#issuecomment-48352678 Can one of the admins verify this patch?:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL]Update MultiInstanceRelation.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1312#issuecomment-48421247 thank you @marmbrus , close this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL]Update MultiInstanceRelation.scala
Github user baishuo closed the pull request at: https://github.com/apache/spark/pull/1312 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update HiveMetastoreCatalog.scala
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/1569 Update HiveMetastoreCatalog.scala I think it's better to defined hiveQlTable as a val You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1569.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1569 commit a7b32a28a59886dfac45331d781a548fc18b098f Author: baishuo(ç½ç¡) Date: 2014-07-24T07:01:33Z Update HiveMetastoreCatalog.scala I think it's better to defined hiveQlTable as a val --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL]Update HiveMetastoreCatalog.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1569#issuecomment-50101782 modify the title, add [SQL] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SQL]Update HiveMetastoreCatalog.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1569#issuecomment-50180682 thank you @marmbrus , I had modify it to "@transient lazy val" and the run "sbt/sbt catalyst/test sql/test hive/test" at master branch , all test passed. if only "val", test can not passed. the following is the tail of results of test: [info] - Partition pruning - with filter on string partition key - query test [info] - Partition pruning - with filter on int partition key - pruning test [info] - Partition pruning - with filter on int partition key - query test [info] - Partition pruning - left only 1 partition - pruning test [info] - Partition pruning - left only 1 partition - query test [info] - Partition pruning - all partitions pruned - pruning test [info] - Partition pruning - all partitions pruned - query test [info] - Partition pruning - pruning with both column key and partition key - pruning test [info] - Partition pruning - pruning with both column key and partition key - query test [info] HiveResolutionSuite: [info] - table.attr [info] - database.table [info] - database.table table.attr [info] - alias.attr [info] - subquery-alias.attr [info] - quoted alias.attr [info] - attr [info] - alias.star [info] - case insensitivity with scala reflection [info] - nested repeated resolution [info] BigDataBenchmarkSuite: [info] - No data files found for BigDataBenchmark tests. !!! IGNORED !!! [info] ScalaTest [info] Run completed in 2 minutes, 55 seconds. [info] Total number of tests run: 150 [info] Suites: completed 14, aborted 0 [info] Tests: succeeded 150, failed 0, canceled 0, ignored 7, pending 0 [info] All tests passed. [info] Passed: Total 150, Failed 0, Errors 0, Passed 150, Ignored 7 [success] Total time: 267 s, completed Jul 25, 2014 10:17:09 AM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-3999][deploy] resolve the wrong number ...
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/2842 [SPARK-3999][deploy] resolve the wrong number of arguments for pattern error AssociationErrorEvent which is provided by akka-remote_2.10-2.2.3-shaded-protobuf.jar only have 4 arguments You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark testAkka Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2842.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2842 commit ab328948c5efca9807ad4342c63047a2b1889197 Author: baishuo Date: 2014-10-18T16:13:37Z modify the number of arguments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3999][deploy] resolve the wrong number ...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2842#issuecomment-59688527 @JoshRosen @pwendell I know the reason of this problemã In idea, I should right click the project and click maven->reimport --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3999][deploy] resolve the wrong number ...
Github user baishuo closed the pull request at: https://github.com/apache/spark/pull/2842 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4034]change the scope of guava to compi...
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/2876 [SPARK-4034]change the scope of guava to compile After click maven->reimport for spark project in idea, and begin to start "sparksqlclidriver" in idea, we will get a exception: Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/util/concurrent/ThreadFactoryBuilder at org.apache.spark.util.Utils$.(Utils.scala:611) at org.apache.spark.util.Utils$.(Utils.scala) at org.apache.spark.SparkContext.(SparkContext.scala:178) at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:36) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:256) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:149) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) This is casued by after maven->reimport was clicked, the scope of guava*.jar in the project spark-hive-thriftserver is changed to provided(rigth click project spark-hive-thriftserver->choose the tab Dependencies, will find each jar's scope in this project ). We can change it to "compile" ,and re-start SparkSQLCLIDriver, the excepiton disappear. But if we re-run maven->reimport, the scope of guava*.jar will return to "provided" You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark patch-4034-pom-provided Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2876.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2876 commit 17c41b4552dfef37ad6d89498546695e066268dd Author: baishuo Date: 2014-10-21T10:14:55Z change the scope of guava to compile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4034]change the scope of guava to compi...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2876#issuecomment-59907899 i think the root cause is: the scope of guava in root pom.xml is "provided", every time when we do reimport (right click the whole project, click maven->Reimport), the scope will be set to "provided" and cause the Exception.If we change it to "compile", the Exception will never occurs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4034]change the scope of guava to compi...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2876#issuecomment-59910134 Hi @srowen and @vanzin, If we do not do Reimport, there is no problem. But if we do (Reimport can help idea refresh the jars) and run SparkSQLCLIDriver. The exception will occur. And I think if one meet this Exception, maybe he can not find the methd to resolve it in a short time --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4034]change the scope of guava to compi...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2876#issuecomment-60341700 hi @vanzin , I had modify 4 pom.xml, change the scope of guava to "provided" at root pom.xml. And all test of sql project can passed. can this change be tested? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3241][SQL] create NumberFormat instance...
Github user baishuo closed the pull request at: https://github.com/apache/spark/pull/2157 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3241][SQL] create NumberFormat instance...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2157#issuecomment-53669323 yes,no problem :) close this issue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/2226 [SPARK-3007][SQL]Add "Dynamic Partition" support to Spark Sql hive a new PR base on new master. changes are the same as https://github.com/apache/spark/pull/1919 You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark patch-3007 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2226.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2226 commit d3e206e1a2fadc271e365462bd93730e31a094eb Author: baishuo(ç½ç¡) Date: 2014-08-12T17:27:54Z Update HiveQl.scala commit b22857a365925a428c41dd3e93d0da3613053071 Author: baishuo(ç½ç¡) Date: 2014-08-12T17:29:36Z Update SparkHadoopWriter.scala commit bade51d4726b8c55de83fef5c3e42c48f5af8f59 Author: baishuo(ç½ç¡) Date: 2014-08-12T17:31:01Z Update InsertIntoHiveTable.scala commit d211d330550260d93752349682e7c8447691a9e5 Author: baishuo(ç½ç¡) Date: 2014-08-12T17:53:04Z Update InsertIntoHiveTable.scala commit f0f620d277ecc7e342c42d88e5b12062eecd8261 Author: baishuo(ç½ç¡) Date: 2014-08-18T06:29:21Z Update HiveCompatibilitySuite.scala commit 412a48b185785dafb7a0ff450018e65dde7c4189 Author: baishuo(ç½ç¡) Date: 2014-08-18T06:34:53Z Update InsertIntoHiveTable.scala commit 567972c2c4ff85e9d09b2c75fbffe5891b438b1c Author: baishuo(ç½ç¡) Date: 2014-08-18T06:36:58Z Update HiveQuerySuite.scala commit 8e51a4bc47a1f5517e99dd1ebb456ae95376d8c2 Author: baishuo(ç½ç¡) Date: 2014-08-18T07:18:07Z Update Cast.scala commit b80f2021eca650b29a7baad35ba61ece90a7fc54 Author: baishuo(ç½ç¡) Date: 2014-08-18T07:44:07Z Update InsertIntoHiveTable.scala commit 924042c3118337bb6a944e0d4e3ece46ec65dd83 Author: baishuo(ç½ç¡) Date: 2014-08-18T07:57:20Z Update Cast.scala commit af8411aeefeae90fb5c79b88b38a5d299b11ddff Author: baishuo Date: 2014-08-19T16:01:49Z update file after test commit 0c324beaa38abfd089257466a0a0ddd6e57c5fad Author: baishuo Date: 2014-08-19T17:14:53Z do a little modify commit 2a0e0b82cacf50552de60aead7b25e04323cd0f9 Author: baishuo Date: 2014-09-01T06:28:17Z for dynamic partition erge branch 'patch-1' into patch-3007 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1919#issuecomment-54032088 Hi @ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo closed the pull request at: https://github.com/apache/spark/pull/1919 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1919#issuecomment-54244065 no problemï¼ close this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-54259701 Hi @marmbrus and @liancheng, the latest code had pass "dev/lint-scala" and "sbt/sbt catalyst/test sql/test hive/test" locally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-54574495 can this PR be tested? The golden file related HiveCompatibilitySuite with had already exists in master branch of spark. So do not need to add them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-54575671 can this PR be tested? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on a diff in the pull request: https://github.com/apache/spark/pull/2226#discussion_r17287305 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -101,62 +103,135 @@ case class InsertIntoHiveTable( } def saveAsHiveFile( - rdd: RDD[Writable], + rdd: RDD[(Writable, String)], valueClass: Class[_], fileSinkConf: FileSinkDesc, - conf: JobConf, - isCompressed: Boolean) { + conf: SerializableWritable[JobConf], + isCompressed: Boolean, + dynamicPartNum: Int) { if (valueClass == null) { throw new SparkException("Output value class not set") } -conf.setOutputValueClass(valueClass) +conf.value.setOutputValueClass(valueClass) if (fileSinkConf.getTableInfo.getOutputFileFormatClassName == null) { throw new SparkException("Output format class not set") } // Doesn't work in Scala 2.9 due to what may be a generics bug // TODO: Should we uncomment this for Scala 2.10? // conf.setOutputFormat(outputFormatClass) -conf.set("mapred.output.format.class", fileSinkConf.getTableInfo.getOutputFileFormatClassName) +conf.value.set("mapred.output.format.class", + fileSinkConf.getTableInfo.getOutputFileFormatClassName) if (isCompressed) { // Please note that isCompressed, "mapred.output.compress", "mapred.output.compression.codec", // and "mapred.output.compression.type" have no impact on ORC because it uses table properties // to store compression information. - conf.set("mapred.output.compress", "true") + conf.value.set("mapred.output.compress", "true") fileSinkConf.setCompressed(true) - fileSinkConf.setCompressCodec(conf.get("mapred.output.compression.codec")) - fileSinkConf.setCompressType(conf.get("mapred.output.compression.type")) + fileSinkConf.setCompressCodec(conf.value.get("mapred.output.compression.codec")) + fileSinkConf.setCompressType(conf.value.get("mapred.output.compression.type")) } -conf.setOutputCommitter(classOf[FileOutputCommitter]) -FileOutputFormat.setOutputPath( - conf, - SparkHiveHadoopWriter.createPathFromString(fileSinkConf.getDirName, conf)) +conf.value.setOutputCommitter(classOf[FileOutputCommitter]) +FileOutputFormat.setOutputPath( + conf.value, + SparkHiveHadoopWriter.createPathFromString(fileSinkConf.getDirName, conf.value)) log.debug("Saving as hadoop file of type " + valueClass.getSimpleName) +var writer: SparkHiveHadoopWriter = null +// Map restore writesr for Dynamic Partition +var writerMap: scala.collection.mutable.HashMap[String, SparkHiveHadoopWriter] = null +if (dynamicPartNum == 0) { + writer = new SparkHiveHadoopWriter(conf.value, fileSinkConf) + writer.preSetup() +} else { + writerMap = new scala.collection.mutable.HashMap[String, SparkHiveHadoopWriter] +} -val writer = new SparkHiveHadoopWriter(conf, fileSinkConf) -writer.preSetup() - -def writeToFile(context: TaskContext, iter: Iterator[Writable]) { - // Hadoop wants a 32-bit task attempt ID, so if ours is bigger than Int.MaxValue, roll it - // around by taking a mod. We expect that no task will be attempted 2 billion times. - val attemptNumber = (context.attemptId % Int.MaxValue).toInt - +def writeToFile(context: TaskContext, iter: Iterator[(Writable, String)]) { +// Hadoop wants a 32-bit task attempt ID, so if ours is bigger than Int.MaxValue, roll it +// around by taking a mod. We expect that no task will be attempted 2 billion times. +val attemptNumber = (context.attemptId % Int.MaxValue).toInt +// writer for No Dynamic Partition +if (dynamicPartNum == 0) { writer.setup(context.stageId, context.partitionId, attemptNumber) writer.open() +} - var count = 0 - while(iter.hasNext) { -val record = iter.next() -count += 1 -writer.write(record) +var count = 0 +// writer for Dynamic Partition +var writer2: SparkHiveHadoopWriter = null +while(iter.hasNext) { + val record = iter.next() + count += 1 + if (record._2 == null) { // without Dynamic Partition +writer.write(record._1) + } else { // for Dynamic Partition + val location = fileSinkConf.getDirName + val partLocation = loca
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on a diff in the pull request: https://github.com/apache/spark/pull/2226#discussion_r17290567 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -178,6 +253,40 @@ case class InsertIntoHiveTable( val tableLocation = table.hiveQlTable.getDataLocation val tmpLocation = hiveContext.getExternalTmpFileURI(tableLocation) val fileSinkConf = new FileSinkDesc(tmpLocation.toString, tableDesc, false) +var tmpDynamicPartNum = 0 +var numStaPart = 0 +val partitionSpec = partition.map { + case (key, Some(value)) => +numStaPart += 1 +key -> value + case (key, None) => +tmpDynamicPartNum += 1 +key -> "" --- End diff -- the hive api will handle the ââï¼when hive meet the value is ââ, it will know there is a dynamic patition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-54949644 Hi @marmbrus thanks a lot for your advice. I had modify the code according to your advice. I try to seperate dynamic partition support by use the condition "if (dynamicPartNum == 0)" twice. one is in saveAsHiveFile and the other is in writeToFile. Please help me to check if it is proper. thank you :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-54949823 I try to explain my design idea(the code is mostly in InsertIntoHiveTable.scala) : lets assume there is a table called table1,which has 2 columns:col1,col2, and two partitions: part1, part2. first: In case of just insert data to a static partition,I find when "saveAsHiveFile" finished, the data was wroten to a temporary location, then directory like: /tmp/hive-root/hive_/-ext-1,lets call it TMPLOCATION. And under TMPLOCATION, there is sub directory /part1=.../part2=... , all data was store under TMPLOCATION/part1=.../part2=... , then spark will call hive api "loadPartition" to move the files to {hivewarehouse}/{tablename}/part1=.../part2=... and update the metadata. then the whole progress is OK. If we what to implement the "dynamic partiton function", we need to use hive api "loadDynamicPartitions" to move data and update metadata. But the requirement of directory formate for "loadDynamicPartitions" is a little difference to "loadPartition": 1: In case of one static partition and one dynamic partition (HQL like " insert overwrite table table1 partition(part1=val1,part2) select a,b,c from ..."), loadDynamicPartitions need the tmp data located at TMPLOCATION/part2=c1(there is NO "part1=val1", in the progress of loadDynamicPartitions, it wiil be added), TMPLOCATION/part2=c2 ..., And loadDynamicPartitions will move them to {hivewarehouse}/{tablename}/part1=val1/part2=c1, {hivewarehouse}/{tablename}/part1=val1/part2=c2 , and update the metadata. Note that in this case loadDynamicPartitions do note need the subdir like part1=val1 under TMPLOCATION 2: In case of zero static partition and 2 dynamic partition (HQL like " insert overwrite table table1 partition(part1,part2) select a,b,x,c from ..."), loadDynamicPartitions need the tmp data located at TMPLOCATION/part1=../part2=c1, TMPLOCATION/part1=../part2=c2 ..., And loadDynamicPartitions will move them to {hivewarehouse}/{tablename}/part1=../part2=..., So whether there is static partition in HQL determines how we create subdir under TMPLOCATION. That why the function "getDynamicPartDir" exist. second: where shall we call the "getDynamicPartDir"? must a location that we can get the values for dynamic partiton. so we call this function at "iter.map { row =>..." in the closure of "val rdd = childRdd.mapPartitions". when we get the row, we can get the values for dynamic partiton. after we get the dynamicPartPath by function getDynamicPartDir, we can pass it to next RDD by the output this RDD: serializer.serialize(outputData, standardOI) -> dynamicPartPath. (for the static partiton,dynamicPartPath is null) when the next rdd (closure in writeToFile) get the data and dynamicPartPath, we can check if the dynamicPartPath equals null. if not null. we check if there is already a corresponding writer exist in writerMap which store all writer for each partition. if there is. we use this writer to write the record. that ensure the data belongs to same partition will be wroten to the same directory. loadDynamicPartitions require there is no other files under TMPLOCATION except the subdir for dynamic partition. that why there are several "if (dynamicPartNum == 0)" in writeToFile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-54966993 after check the consoleFull there is a error occurs when run the test "full outer join" [info] - full outer join 05:02:22.633 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 428.0 (TID 48876) org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: n#12 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:47) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:46) I think the error has no relation with this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-55067304 had update the file according liancheng's comment. and test it locally --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-55225133 steps to verify this PR by SparkSQLCliDriver: firstï¼create two table: run the following sql: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-55278210 can this PR merged?:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on a diff in the pull request: https://github.com/apache/spark/pull/2226#discussion_r17731044 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala --- @@ -522,6 +523,52 @@ class HiveQuerySuite extends HiveComparisonTest { case class LogEntry(filename: String, message: String) case class LogFile(name: String) + createQueryTest("dynamic_partition", +""" + |DROP TABLE IF EXISTS dynamic_part_table; + |CREATE TABLE dynamic_part_table(intcol INT) PARTITIONED BY (partcol1 INT, partcol2 INT); + | + |SET hive.exec.dynamic.partition.mode=nonstrict; + | + |INSERT INTO TABLE dynamic_part_table PARTITION(partcol1, partcol2) + |SELECT 1, 1, 1 FROM src WHERE key=150; + | + |INSERT INTO TABLE dynamic_part_table PARTITION(partcol1, partcol2) + |SELECT 1, NULL, 1 FROM src WHERE key=150; + | + |INSERT INTO TABLE dynamic_part_table PARTITION(partcol1, partcol2) + |SELECT 1, 1, NULL FROM src WHERE key=150; + | + |INSERT INTO TABLe dynamic_part_table PARTITION(partcol1, partcol2) + |SELECT 1, NULL, NULL FROM src WHERE key=150; + | + |DROP TABLE IF EXISTS dynamic_part_table; +""".stripMargin) --- End diff -- to check data in correct partitions just meaning data in correct folder --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-56473456 thanks a lot to @liancheng :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add Dynamic Partition support...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-56616245 had remove "s from title @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add Dynamic Partition support...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-56621834 I think I should say thank you to @liancheng and @yhuai. During the communication with you, I had learned a lot :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add Dynamic Partition support...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-56770301 hi @marmbrus ,would you please run the merge script again? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/1919 [SPARK-3007][SQL]Add "Dynamic Partition" support to Spark Sql hive the detail please refer the comment of https://issues.apache.org/jira/browse/SPARK-3007 You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1919.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1919 commit d3e206e1a2fadc271e365462bd93730e31a094eb Author: baishuo(ç½ç¡) Date: 2014-08-12T17:27:54Z Update HiveQl.scala commit b22857a365925a428c41dd3e93d0da3613053071 Author: baishuo(ç½ç¡) Date: 2014-08-12T17:29:36Z Update SparkHadoopWriter.scala commit bade51d4726b8c55de83fef5c3e42c48f5af8f59 Author: baishuo(ç½ç¡) Date: 2014-08-12T17:31:01Z Update InsertIntoHiveTable.scala commit d211d330550260d93752349682e7c8447691a9e5 Author: baishuo(ç½ç¡) Date: 2014-08-12T17:53:04Z Update InsertIntoHiveTable.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1919#issuecomment-52026271 I didnt have add the related test since I dont know how to write it. but I had test the function by SparkSQLCLIDriver --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1919#issuecomment-52141005 hi @marmbrus , when I study the HiveQuerySuite.scala, I found there is a important table : src, but I didnt find where and how the table created, would please give more instruction? thank you :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1919#issuecomment-52583642 thanks a lot @yhuai and @liancheng:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1919#issuecomment-52670830 Hi @marmbrus and @liancheng I had made some modification and do the test with "sbt/sbt catalyst/test sql/test hive/test" . Please help me to check if it is proper when you have time . Thank you :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1919#issuecomment-52728496 here I try to express my design idea clearly: lets assume there is a table called table1,which has 2 columns:col1,col2, and two partitions: part1, part2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1919#issuecomment-52734543 I also curious about that. I down the master branch,and check the folder sql/hive/src/test/resources/golden I find that files begin with dynamic_partition_skip_default* or load_dyn_part* already exist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1919#issuecomment-52758525 Here I try to explain my design idea(the code is mostly in InsertIntoHiveTable.scala) : lets assume there is a table called table1,which has 2 columns:col1,col2, and two partitions: part1, part2. ONE: In case of just insert data to a static partition,I find when "saveAsHiveFile" finished, the data was wroten to a temporary location, then directory like: /tmp/hive-root/hive_/-ext-1,lets call it TMPLOCATION. And under TMPLOCATION, there is sub directory /part1=.../part2=... , all data was store under TMPLOCATION/part1=.../part2=... , then spark will call hive api "loadPartition" to move the files to {hivewarehouse}/{tablename}/part1=.../part2=... and update the metadata. then the whole progress is OK. If we what to implement the "dynamic partiton function", we need to use hive api "loadDynamicPartitions" to move data and update metadata. But the requirement of directory formate for "loadDynamicPartitions" is a little difference to "loadPartition": 1: In case of one static partition and one dynamic partition (HQL like " insert overwrite table table1 partition(part1=val1,part2) select a,b,c from ..."), loadDynamicPartitions need the tmp data located at TMPLOCATION/part2=c1, TMPLOCATION/part2=c2 ..., And loadDynamicPartitions will move them to {hivewarehouse}/{tablename}/part1=val1/part2=c1, {hivewarehouse}/{tablename}/part1=val1/part2=c2 , and update the metadata. Note that in this case loadDynamicPartitions do note need the subdir like part1=val1 under TMPLOCATION 2: In case of zero static partition and 2 dynamic partition (HQL like " insert overwrite table table1 partition(part1,part2) select a,b,x,c from ..."), loadDynamicPartitions need the tmp data located at TMPLOCATION/part1=../part2=c1, TMPLOCATION/part1=../part2=c2 ..., And loadDynamicPartitions will move them to {hivewarehouse}/{tablename}/part1=../part2=..., So if there have static partition in HQL determine how we create subdir under TMPLOCATION. That why the function "getDynamicPartDir" exist. TWO: where shall we call the "getDynamicPartDir"? must a location that we can get the values for dynamic partiton. so we call this function at "iter.map { row =>..." in the closure of "val rdd = childRdd.mapPartitions". when we get the row, we can get the values for dynamic partiton. after we get the dynamicPartPath by function getDynamicPartDir, we can pass it to next RDD by the output this RDD: serializer.serialize(outputData, standardOI) -> dynamicPartPath. (for the static partiton,dynamicPartPath is null) when the next rdd (closure in writeToFile) get the data and dynamicPartPath, we can check if the dynamicPartPath equals null. if not null. we check if there is already a corresponding writer exist in writerMap which store all writer for each partition. if there is. we use this writer to write the record. that ensure the data belongs to same partition will be wroten to the same directory. loadDynamicPartitions require there is no other files under TMPLOCATION except the subdir for dynamic partition. that why there are several "if (dynamicPartNum == 0)" in writeToFile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/1919#issuecomment-53390280 Hi @marmbrus i had update the file relating with test. all test passed on my machine. Would you please help to verify this patch when you have time:) I had write out the thinking of the code. thank you. @rxin @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3241][SQL] create NumberFormat instance...
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/2157 [SPARK-3241][SQL] create NumberFormat instance by threadsafe way You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark patch-threadlocal Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2157.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2157 commit 5d05a01d7737ee86ed42cb004b01d0cf22d4d695 Author: baishuo Date: 2014-08-27T03:12:24Z create NumberFormat instance by threadsafe way --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3241][SQL] create NumberFormat instance...
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/2157#issuecomment-53544332 thank you @chenghao-intel . I think I didnt express what I think clearly. why there is a threadlocal is to ensure there is one and only one NumberFormat instance in the same thread. othrewise, if open was called more than once, there maybe more than one instance of NumberFormat --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update CommandUtils.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/157#issuecomment-37779718 Hey @pwendell I think maybe the "-Xdebug -Xrunjdwp..." option can not located behind the classpath? In my privious work, I always set it before "-cp" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update CommandUtils.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/157#issuecomment-37787235 @srowen thank you @pwendell no problem --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update CommandUtils.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/157#issuecomment-37804560 @mateiz I had update the monitoring.md but I dont know how to send a pull request with this file seperatly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update CommandUtils.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/157#issuecomment-37954039 Ohï¼I seeï¼please let me do that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update CommandUtils.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/157#issuecomment-38272451 please close this PRï¼thank you --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update CommandUtils.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/157#issuecomment-38272488 please close this PRï¼thank you @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update CommandUtils.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/157#issuecomment-38281673 Hi @pwendell @mateiz ï¼i'm a new user of github, could you please teach me how to undo the recently commit (I just want to make my master branche the same as spark:master) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Branch 0.9
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/196#issuecomment-38288017 sorry I do a wrong commit please close it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update spark-daemon.sh
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/197 Update spark-daemon.sh Since the previous command is : cd "$SPARK_PREFIX" , so we can call spark-class by "./bin/spark-class" instead of "$SPARK_PREFIX"/bin/spark-class You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark branch-0.9 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/197.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #197 commit a7c3da8d7dba299e4edb13a37a81766b9a2200df Author: baishuo(ç½ç¡) Date: 2014-03-21T15:28:19Z Update spark-daemon.sh Since the previous command is : cd "$SPARK_PREFIX" , so we can call spark-class by "./bin/spark-class" instead of "$SPARK_PREFIX"/bin/spark-class --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update spark-daemon.sh
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/197#issuecomment-38290298 ./bin/spark-class or bin/spark-class, which is better? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Branch 0.9
Github user baishuo closed the pull request at: https://github.com/apache/spark/pull/196 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update CommandUtils.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/157#issuecomment-38292256 close it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update CommandUtils.scala
Github user baishuo closed the pull request at: https://github.com/apache/spark/pull/157 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update spark-daemon.sh
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/197#issuecomment-38298607 or deleteâ cd "$SPARK_PREFIX" â? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update slaves.sh
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/238 Update slaves.sh update the comment You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/238.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #238 commit 0f08299625bf6135365e8127cb6cfbca2162c909 Author: baishuo(ç½ç¡) Date: 2014-03-26T16:03:45Z Update slaves.sh update the comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update slaves.sh
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/238#issuecomment-38713391 yeah ,get it. thank you@ppwendell. close it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update slaves.sh
Github user baishuo closed the pull request at: https://github.com/apache/spark/pull/238 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update spark-daemon.sh
Github user baishuo closed the pull request at: https://github.com/apache/spark/pull/197 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update spark-daemon.sh
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/197#issuecomment-38762615 close now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update spark-daemon.sh
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/254 Update spark-daemon.sh I think we do not need 'cd "$SPARK_PREFIX" ' to run the spark-class. Am I right? You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark aaa Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/254.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #254 commit 6a41627c57e842833a7b0d5f4e9f46b4e58fa1e7 Author: baishuo(ç½ç¡) Date: 2014-03-27T13:36:48Z Update spark-daemon.sh I think we do not need 'cd "$SPARK_PREFIX" ' to run the spark-class. Am I right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update GradientDescentSuite.scala
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/256 Update GradientDescentSuite.scala use more faster way to construct an Array You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark bbb Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/256.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #256 commit a0d82044d14ab0c20fc93fe25e14753a963a9170 Author: baishuo(ç½ç¡) Date: 2014-03-27T15:04:49Z Update GradientDescentSuite.scala use more faster way to construct an Array --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update WindowedDStream.scala
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/390 Update WindowedDStream.scala update the content of Exception when windowDuration is not multiple of parent.slideDuration You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark windowdstream Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/390.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #390 commit 533c96828cbc54ef7f8e061027bd31cb233b76be Author: baishuo(ç½ç¡) Date: 2014-04-11T08:50:56Z Update WindowedDStream.scala update the content of Exception when windowDuration is not multiple of parent.slideDuration --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update WindowedDStream.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/390#issuecomment-40275299 thank you @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update WindowedDStream.scala
Github user baishuo closed the pull request at: https://github.com/apache/spark/pull/390 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update WindowedDStream.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/390#issuecomment-40299063 no problem @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update ReducedWindowedDStream.scala
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/425 Update ReducedWindowedDStream.scala change _slideDuration to _windowDuration You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/425.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #425 commit 6f09ea1e6c2892a6f04a197931d4385a8c3cee2d Author: baishuo(ç½ç¡) Date: 2014-04-16T09:42:09Z Update ReducedWindowedDStream.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update KafkaWordCount.scala
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/523 Update KafkaWordCount.scala modify the required args number You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/523.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #523 commit 0368ba9a404cece382010d1020a698c29b20e964 Author: baishuo(ç½ç¡) Date: 2014-04-24T03:00:29Z Update KafkaWordCount.scala modify the required args number --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update KafkaWordCount.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/523#issuecomment-41361697 I think there need at least 4 argumentsï¼am i rightï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update GradientDescentSuite.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/256#issuecomment-41657234 no problem @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update GradientDescentSuite.scala
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/588 Update GradientDescentSuite.scala use more faster way to construct an array You can merge this pull request into a Git repository by running: $ git pull https://github.com/baishuo/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/588.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #588 commit b666d27bbdde653c98eeb9b8c96ad10d6fd2a110 Author: baishuo(ç½ç¡) Date: 2014-04-29T13:03:24Z Update GradientDescentSuite.scala use more faster way to construct an array --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update GradientDescentSuite.scala
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/256#issuecomment-41673035 Hi @mengxr the new PR is https://github.com/apache/spark/pull/588 please go to see if it can merge, thank you . close this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---