[GitHub] spark pull request #15914: delete temporary folder after insert hive table

2016-11-16 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/15914

delete temporary folder after insert hive table

## What changes were proposed in this pull request?

Modify the code of InsertIntoHiveTable.scala. To fix 
https://issues.apache.org/jira/browse/SPARK-14974

## How was this patch tested?

I think this patch can be tested manually.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark SPARK-14974-20161117

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15914.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15914


commit cb08136733b4b4dc48e488e33525dcebb715a75f
Author: baishuo 
Date:   2016-11-17T06:35:29Z

delete temporary folder after insert hive table




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14262: [SPARK-14974][SQL]delete temporary folder after insert h...

2016-11-16 Thread baishuo
Github user baishuo commented on the issue:

https://github.com/apache/spark/pull/14262
  
close this and  open the same one base on new master branch.
https://github.com/apache/spark/pull/15914


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14262: [SPARK-14974][SQL]delete temporary folder after i...

2016-11-16 Thread baishuo
Github user baishuo closed the pull request at:

https://github.com/apache/spark/pull/14262


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5084][SQL]add if not exists after creat...

2015-01-06 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/3895#issuecomment-68840957
  
I had modify some code and do test locally


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4908][SQL][hotfix]narrow the scope of s...

2015-01-11 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/4001

[SPARK-4908][SQL][hotfix]narrow the scope of synchronized for PR 3834

compared with  https://github.com/apache/spark/pull/3834,  this PR narrow 
the scope of synchronized 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark SPARK-4908-20141231

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4001.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4001


commit 4bfa3067f6d1494c770d49375498cf1b4adbaa45
Author: baishuo 
Date:   2015-01-12T07:06:14Z

narrow the scope of synchronized for PR 3834




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4908][SQL]narrow the scope of synchroni...

2015-01-14 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/4001#issuecomment-69883507
  
Hi @liancheng and @marmbrus I had remove [hotfix] from the title. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4908][SQL]narrow the scope of synchroni...

2015-01-14 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/4001#issuecomment-69884566
  
Indeed, the code passed all the test when I do test locally,  I had 
[hotfix] to title just because i want illustrate that this is not the final 
solution of [SPARK-4908]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4908][SQL]narrow the scope of synchroni...

2015-01-14 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/4001#issuecomment-70046499
  
Hi @marmbrus ,can this PR be merged? :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5084][SQL]add if not exists after creat...

2015-01-14 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/3895#issuecomment-70046564
  
Hi @marmbrus ,can this PR be merged? :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4908][SQL]narrow the scope of synchroni...

2015-01-20 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/4001#issuecomment-70778989
  
no problem,clolse it :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4908][SQL]narrow the scope of synchroni...

2015-01-20 Thread baishuo
Github user baishuo closed the pull request at:

https://github.com/apache/spark/pull/4001


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update BasicOperationsSuite.scala

2014-06-22 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1084#issuecomment-46786699
  
let me do a check


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update BasicOperationsSuite.scala

2014-06-30 Thread baishuo
Github user baishuo closed the pull request at:

https://github.com/apache/spark/pull/1084


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update BasicOperationsSuite.scala

2014-06-30 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1084#issuecomment-47611820
  
closed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update SQLConf.scala

2014-06-30 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/1272

Update SQLConf.scala

use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1272.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1272


commit 0740f28b04a43ac739d6f45a7ffc6fa23fe7b96c
Author: baishuo(白硕) 
Date:   2014-07-01T03:29:12Z

Update SQLConf.scala

use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update SQLConf.scala

2014-06-30 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1272#issuecomment-47621874
  
I  add some synchronized please see if it is thread safe,and Jenkins 
should test this once more


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update SQLConf.scala

2014-07-01 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1272#issuecomment-47625269
  
thanks @aarondav ,had modified according to your comment,please help me 
to check if it is proper


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update SQLConf.scala

2014-07-01 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1272#issuecomment-47627343
  
Hi,@rxin ,had remove indent spacing on 
def set(props: Properties): Unit = {
props.asScala.foreach { case (k, v) => this.settings.put(k, v) }
  }
please help me to check if it is proper


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update SQLConf.scala

2014-07-01 Thread baishuo
Github user baishuo commented on a diff in the pull request:

https://github.com/apache/spark/pull/1272#discussion_r14392768
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -50,8 +50,7 @@ trait SQLConf {
   /** ** SQLConf functionality methods  */
 
   @transient
-  private val settings = java.util.Collections.synchronizedMap(
-new java.util.HashMap[String, String]())
+  private val settings = new 
java.util.concurrent.ConcurrentHashMap[String, String]()
--- End diff --

undo to Collections.synchronizedMap


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update SQLConf.scala

2014-07-01 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1272#issuecomment-47634643
  
hi @rxin, how to modify is proper?
use settings.sycnronize {
   ...
}
to ensure the thread safe?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update SQLConf.scala

2014-07-02 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1272#issuecomment-47865791
  
modify according @cloud-fan ‘s comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update SQLConf.scala

2014-07-03 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1272#issuecomment-48005584
  
ooh,sorry about the compile error,had change orElse to getOrElse. thank 
you @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update MultiInstanceRelation.scala

2014-07-06 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/1312

Update MultiInstanceRelation.scala

I think if multiAppearance is empty,we can return plan directly

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark test-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1312.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1312


commit b779773356ba475085ea425a1c8a23048b13fd4f
Author: baishuo(白硕) 
Date:   2014-07-07T02:26:43Z

Update MultiInstanceRelation.scala

I think if multiAppearance is empty,we can return plan directly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL]Update MultiInstanceRelation.scala

2014-07-08 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1312#issuecomment-48352678
  
Can one of the admins verify this patch?:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL]Update MultiInstanceRelation.scala

2014-07-08 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1312#issuecomment-48421247
  
thank you @marmbrus , close this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL]Update MultiInstanceRelation.scala

2014-07-08 Thread baishuo
Github user baishuo closed the pull request at:

https://github.com/apache/spark/pull/1312


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update HiveMetastoreCatalog.scala

2014-07-24 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/1569

Update HiveMetastoreCatalog.scala

I think it's better to defined hiveQlTable as a val

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1569.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1569


commit a7b32a28a59886dfac45331d781a548fc18b098f
Author: baishuo(白硕) 
Date:   2014-07-24T07:01:33Z

Update HiveMetastoreCatalog.scala

I think it's better to defined hiveQlTable as a val




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL]Update HiveMetastoreCatalog.scala

2014-07-24 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1569#issuecomment-50101782
  
modify the title, add [SQL]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL]Update HiveMetastoreCatalog.scala

2014-07-25 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1569#issuecomment-50180682
  
thank you @marmbrus , I had modify it to "@transient lazy val"  and the run 
"sbt/sbt catalyst/test sql/test hive/test" at master branch , all test passed.  
if only "val",  test can not passed.

the following is the tail of results of test:

[info] - Partition pruning - with filter on string partition key - query 
test
[info] - Partition pruning - with filter on int partition key - pruning test
[info] - Partition pruning - with filter on int partition key - query test
[info] - Partition pruning - left only 1 partition - pruning test
[info] - Partition pruning - left only 1 partition - query test
[info] - Partition pruning - all partitions pruned - pruning test
[info] - Partition pruning - all partitions pruned - query test
[info] - Partition pruning - pruning with both column key and partition key 
- pruning test
[info] - Partition pruning - pruning with both column key and partition key 
- query test
[info] HiveResolutionSuite:
[info] - table.attr
[info] - database.table
[info] - database.table table.attr
[info] - alias.attr
[info] - subquery-alias.attr
[info] - quoted alias.attr
[info] - attr
[info] - alias.star
[info] - case insensitivity with scala reflection
[info] - nested repeated resolution
[info] BigDataBenchmarkSuite:
[info] - No data files found for BigDataBenchmark tests. !!! IGNORED !!!
[info] ScalaTest
[info] Run completed in 2 minutes, 55 seconds.
[info] Total number of tests run: 150
[info] Suites: completed 14, aborted 0
[info] Tests: succeeded 150, failed 0, canceled 0, ignored 7, pending 0
[info] All tests passed.
[info] Passed: Total 150, Failed 0, Errors 0, Passed 150, Ignored 7
[success] Total time: 267 s, completed Jul 25, 2014 10:17:09 AM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-3999][deploy] resolve the wrong number ...

2014-10-18 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/2842

[SPARK-3999][deploy] resolve the wrong number of arguments for pattern error

AssociationErrorEvent which is provided by 
akka-remote_2.10-2.2.3-shaded-protobuf.jar only have 4 arguments

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark testAkka

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2842.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2842


commit ab328948c5efca9807ad4342c63047a2b1889197
Author: baishuo 
Date:   2014-10-18T16:13:37Z

modify the number of arguments




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3999][deploy] resolve the wrong number ...

2014-10-19 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2842#issuecomment-59688527
  
@JoshRosen @pwendell   I  know the reason of this problem。 In idea, I 
should right click the project and click maven->reimport


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3999][deploy] resolve the wrong number ...

2014-10-19 Thread baishuo
Github user baishuo closed the pull request at:

https://github.com/apache/spark/pull/2842


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4034]change the scope of guava to compi...

2014-10-21 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/2876

[SPARK-4034]change the scope of guava to compile

After click maven->reimport for spark project in idea, and begin to start 
"sparksqlclidriver" in idea, we will get a exception:

Exception in thread "main" java.lang.NoClassDefFoundError: 
com/google/common/util/concurrent/ThreadFactoryBuilder
at org.apache.spark.util.Utils$.(Utils.scala:611)
at org.apache.spark.util.Utils$.(Utils.scala)
at org.apache.spark.SparkContext.(SparkContext.scala:178)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:36)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:256)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:149)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)

This is casued by after maven->reimport was clicked, the scope of 
guava*.jar in the project spark-hive-thriftserver is changed to provided(rigth 
click project spark-hive-thriftserver->choose the tab Dependencies, will find 
each jar's scope in this project ). We can change it to "compile" ,and re-start 
SparkSQLCLIDriver, the excepiton disappear. But if we re-run maven->reimport, 
the scope of guava*.jar will return to "provided"


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark patch-4034-pom-provided

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2876.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2876


commit 17c41b4552dfef37ad6d89498546695e066268dd
Author: baishuo 
Date:   2014-10-21T10:14:55Z

change the scope of guava to compile




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4034]change the scope of guava to compi...

2014-10-21 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2876#issuecomment-59907899
  
i think the root cause is:  the scope of guava in root pom.xml is 
"provided", every time when we do reimport (right click the whole project, 
click maven->Reimport), the scope will be set to "provided" and cause the 
Exception.If we change it to "compile",  the Exception will never occurs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4034]change the scope of guava to compi...

2014-10-21 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2876#issuecomment-59910134
  
Hi @srowen and @vanzin,  If we do not do Reimport, there is no problem.  
But if we do (Reimport can help idea refresh the jars) and run 
SparkSQLCLIDriver. The exception will occur.  And I think if  one meet this 
Exception, maybe he can not find the methd to resolve it in a short time


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4034]change the scope of guava to compi...

2014-10-23 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2876#issuecomment-60341700
  
hi @vanzin , I had modify 4 pom.xml,  change the scope of guava to 
"provided" at  root pom.xml. And all test of sql project can passed.  can this 
change be tested?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3241][SQL] create NumberFormat instance...

2014-08-27 Thread baishuo
Github user baishuo closed the pull request at:

https://github.com/apache/spark/pull/2157


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3241][SQL] create NumberFormat instance...

2014-08-27 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2157#issuecomment-53669323
  
yes,no problem :) close this issue


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-01 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/2226

[SPARK-3007][SQL]Add "Dynamic Partition" support to Spark Sql hive

a new PR base on new master.  changes are the same as 
https://github.com/apache/spark/pull/1919

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark patch-3007

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2226.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2226


commit d3e206e1a2fadc271e365462bd93730e31a094eb
Author: baishuo(白硕) 
Date:   2014-08-12T17:27:54Z

Update HiveQl.scala

commit b22857a365925a428c41dd3e93d0da3613053071
Author: baishuo(白硕) 
Date:   2014-08-12T17:29:36Z

Update SparkHadoopWriter.scala

commit bade51d4726b8c55de83fef5c3e42c48f5af8f59
Author: baishuo(白硕) 
Date:   2014-08-12T17:31:01Z

Update InsertIntoHiveTable.scala

commit d211d330550260d93752349682e7c8447691a9e5
Author: baishuo(白硕) 
Date:   2014-08-12T17:53:04Z

Update InsertIntoHiveTable.scala

commit f0f620d277ecc7e342c42d88e5b12062eecd8261
Author: baishuo(白硕) 
Date:   2014-08-18T06:29:21Z

Update HiveCompatibilitySuite.scala

commit 412a48b185785dafb7a0ff450018e65dde7c4189
Author: baishuo(白硕) 
Date:   2014-08-18T06:34:53Z

Update InsertIntoHiveTable.scala

commit 567972c2c4ff85e9d09b2c75fbffe5891b438b1c
Author: baishuo(白硕) 
Date:   2014-08-18T06:36:58Z

Update HiveQuerySuite.scala

commit 8e51a4bc47a1f5517e99dd1ebb456ae95376d8c2
Author: baishuo(白硕) 
Date:   2014-08-18T07:18:07Z

Update Cast.scala

commit b80f2021eca650b29a7baad35ba61ece90a7fc54
Author: baishuo(白硕) 
Date:   2014-08-18T07:44:07Z

Update InsertIntoHiveTable.scala

commit 924042c3118337bb6a944e0d4e3ece46ec65dd83
Author: baishuo(白硕) 
Date:   2014-08-18T07:57:20Z

Update Cast.scala

commit af8411aeefeae90fb5c79b88b38a5d299b11ddff
Author: baishuo 
Date:   2014-08-19T16:01:49Z

update file after test

commit 0c324beaa38abfd089257466a0a0ddd6e57c5fad
Author: baishuo 
Date:   2014-08-19T17:14:53Z

do a little modify

commit 2a0e0b82cacf50552de60aead7b25e04323cd0f9
Author: baishuo 
Date:   2014-09-01T06:28:17Z

for dynamic partition
erge branch 'patch-1' into patch-3007




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-01 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1919#issuecomment-54032088
  
Hi @


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-02 Thread baishuo
Github user baishuo closed the pull request at:

https://github.com/apache/spark/pull/1919


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-02 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1919#issuecomment-54244065
  
no problem, close this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-03 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-54259701
  
Hi @marmbrus and @liancheng,  the latest code had pass "dev/lint-scala" and 
"sbt/sbt catalyst/test sql/test hive/test" locally. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-04 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-54574495
  
can this PR be tested? The golden file related HiveCompatibilitySuite with  
had already exists in master branch of spark. So do not need to add them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-04 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-54575671
  
can this PR be tested? :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-09 Thread baishuo
Github user baishuo commented on a diff in the pull request:

https://github.com/apache/spark/pull/2226#discussion_r17287305
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -101,62 +103,135 @@ case class InsertIntoHiveTable(
   }
 
   def saveAsHiveFile(
-  rdd: RDD[Writable],
+  rdd: RDD[(Writable, String)],
   valueClass: Class[_],
   fileSinkConf: FileSinkDesc,
-  conf: JobConf,
-  isCompressed: Boolean) {
+  conf: SerializableWritable[JobConf],
+  isCompressed: Boolean,
+  dynamicPartNum: Int) {
 if (valueClass == null) {
   throw new SparkException("Output value class not set")
 }
-conf.setOutputValueClass(valueClass)
+conf.value.setOutputValueClass(valueClass)
 if (fileSinkConf.getTableInfo.getOutputFileFormatClassName == null) {
   throw new SparkException("Output format class not set")
 }
 // Doesn't work in Scala 2.9 due to what may be a generics bug
 // TODO: Should we uncomment this for Scala 2.10?
 // conf.setOutputFormat(outputFormatClass)
-conf.set("mapred.output.format.class", 
fileSinkConf.getTableInfo.getOutputFileFormatClassName)
+conf.value.set("mapred.output.format.class",
+  fileSinkConf.getTableInfo.getOutputFileFormatClassName)
 if (isCompressed) {
   // Please note that isCompressed, "mapred.output.compress", 
"mapred.output.compression.codec",
   // and "mapred.output.compression.type" have no impact on ORC 
because it uses table properties
   // to store compression information.
-  conf.set("mapred.output.compress", "true")
+  conf.value.set("mapred.output.compress", "true")
   fileSinkConf.setCompressed(true)
-  
fileSinkConf.setCompressCodec(conf.get("mapred.output.compression.codec"))
-  
fileSinkConf.setCompressType(conf.get("mapred.output.compression.type"))
+  
fileSinkConf.setCompressCodec(conf.value.get("mapred.output.compression.codec"))
+  
fileSinkConf.setCompressType(conf.value.get("mapred.output.compression.type"))
 }
-conf.setOutputCommitter(classOf[FileOutputCommitter])
-FileOutputFormat.setOutputPath(
-  conf,
-  SparkHiveHadoopWriter.createPathFromString(fileSinkConf.getDirName, 
conf))
+conf.value.setOutputCommitter(classOf[FileOutputCommitter])
 
+FileOutputFormat.setOutputPath(
+  conf.value,
+  SparkHiveHadoopWriter.createPathFromString(fileSinkConf.getDirName, 
conf.value))
 log.debug("Saving as hadoop file of type " + valueClass.getSimpleName)
+var writer: SparkHiveHadoopWriter = null
+// Map restore writesr for Dynamic Partition
+var writerMap: scala.collection.mutable.HashMap[String, 
SparkHiveHadoopWriter] = null
+if (dynamicPartNum == 0) {
+  writer = new SparkHiveHadoopWriter(conf.value, fileSinkConf)
+  writer.preSetup()
+} else {
+  writerMap =  new scala.collection.mutable.HashMap[String, 
SparkHiveHadoopWriter]
+}
 
-val writer = new SparkHiveHadoopWriter(conf, fileSinkConf)
-writer.preSetup()
-
-def writeToFile(context: TaskContext, iter: Iterator[Writable]) {
-  // Hadoop wants a 32-bit task attempt ID, so if ours is bigger than 
Int.MaxValue, roll it
-  // around by taking a mod. We expect that no task will be attempted 
2 billion times.
-  val attemptNumber = (context.attemptId % Int.MaxValue).toInt
-
+def writeToFile(context: TaskContext, iter: Iterator[(Writable, 
String)]) {
+// Hadoop wants a 32-bit task attempt ID, so if ours is bigger than 
Int.MaxValue, roll it
+// around by taking a mod. We expect that no task will be attempted 2 
billion times.
+val attemptNumber = (context.attemptId % Int.MaxValue).toInt
+// writer for No Dynamic Partition
+if (dynamicPartNum == 0) {
   writer.setup(context.stageId, context.partitionId, attemptNumber)
   writer.open()
+}
 
-  var count = 0
-  while(iter.hasNext) {
-val record = iter.next()
-count += 1
-writer.write(record)
+var count = 0
+// writer for Dynamic Partition
+var writer2: SparkHiveHadoopWriter = null
+while(iter.hasNext) {
+  val record = iter.next()
+  count += 1
+  if (record._2 == null) { // without Dynamic Partition
+writer.write(record._1)
+  } else { // for Dynamic Partition
+  val location = fileSinkConf.getDirName
+  val partLocation = loca

[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-09 Thread baishuo
Github user baishuo commented on a diff in the pull request:

https://github.com/apache/spark/pull/2226#discussion_r17290567
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -178,6 +253,40 @@ case class InsertIntoHiveTable(
 val tableLocation = table.hiveQlTable.getDataLocation
 val tmpLocation = hiveContext.getExternalTmpFileURI(tableLocation)
 val fileSinkConf = new FileSinkDesc(tmpLocation.toString, tableDesc, 
false)
+var tmpDynamicPartNum = 0
+var numStaPart = 0
+val partitionSpec = partition.map {
+  case (key, Some(value)) =>
+numStaPart += 1
+key -> value
+  case (key, None) =>
+tmpDynamicPartNum += 1
+key -> ""
--- End diff --

the hive api will handle the “”,when hive meet the value is “”, it
 will know there is a dynamic patition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-09 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-54949644
  
Hi @marmbrus thanks a lot for your advice. I had modify the code according 
to your advice. 
I try to seperate dynamic partition support by use  the condition "if 
(dynamicPartNum == 0)" twice. one is in saveAsHiveFile and the other is  in 
writeToFile.
Please help me to check if it is proper. thank you :)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-09 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-54949823
  
 I try to explain my design idea(the code is mostly in 
InsertIntoHiveTable.scala) :
lets assume there is a table called table1,which has 2 columns:col1,col2, 
and two partitions: part1, part2.

first:
In case of just insert data to a static partition,I find when 
"saveAsHiveFile" finished, the data was wroten to a temporary location, then 
directory like: /tmp/hive-root/hive_/-ext-1,lets call it TMPLOCATION. 
And under TMPLOCATION, there is sub directory /part1=.../part2=... , all data 
was store under TMPLOCATION/part1=.../part2=... , then spark will call hive api 
"loadPartition" to move the files to 
{hivewarehouse}/{tablename}/part1=.../part2=... and update the metadata. then 
the whole progress is OK.

If we what to implement the "dynamic partiton function", we need to use 
hive api "loadDynamicPartitions" to move data and update metadata. But the 
requirement of directory formate for "loadDynamicPartitions" is a little 
difference to "loadPartition":

1: In case of one static partition and one dynamic partition (HQL like "
insert overwrite table table1 partition(part1=val1,part2) select a,b,c from 
..."), loadDynamicPartitions need the tmp data located at 
TMPLOCATION/part2=c1(there is NO "part1=val1", in the progress of 
loadDynamicPartitions, it wiil be added), TMPLOCATION/part2=c2 ..., And 
loadDynamicPartitions will move them to 
{hivewarehouse}/{tablename}/part1=val1/part2=c1, 
{hivewarehouse}/{tablename}/part1=val1/part2=c2 , and update the metadata. 
Note that in this case loadDynamicPartitions do note need the subdir like 
part1=val1 under TMPLOCATION

2: In case of zero static partition and 2 dynamic partition (HQL like "
insert overwrite table table1 partition(part1,part2) select a,b,x,c from 
..."), loadDynamicPartitions need the tmp data located at 
TMPLOCATION/part1=../part2=c1, TMPLOCATION/part1=../part2=c2 ..., And 
loadDynamicPartitions will move them to 
{hivewarehouse}/{tablename}/part1=../part2=...,

So whether there is static partition in HQL determines how we create subdir 
under TMPLOCATION. That why the function "getDynamicPartDir" exist.

second:
where shall we call the "getDynamicPartDir"? must a location that we can 
get the values for dynamic partiton. so we call this function at "iter.map { 
row =>..." in the closure of "val rdd = childRdd.mapPartitions". when we get 
the row, we can get the values for dynamic partiton. after we get the 
dynamicPartPath by function getDynamicPartDir, we can pass it to next RDD by 
the output this RDD: serializer.serialize(outputData, standardOI) -> 
dynamicPartPath. (for the static partiton,dynamicPartPath is null)

when the next rdd (closure in writeToFile) get the data and 
dynamicPartPath, we can check if the dynamicPartPath equals null. if not null. 
we check if there is already a corresponding writer exist in writerMap which 
store all writer for each partition. if there is. we use this writer to write 
the record. that ensure the data belongs to same partition will be wroten to 
the same directory.

loadDynamicPartitions require there is no other files under TMPLOCATION 
except the subdir for dynamic partition. that why there are several "if 
(dynamicPartNum == 0)" in writeToFile


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-09 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-54966993
  
after check the consoleFull
there is a error occurs when run the test "full outer join"
[info] - full outer join
05:02:22.633 ERROR org.apache.spark.executor.Executor: Exception in task 
0.0 in stage 428.0 (TID 48876)
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
attribute, tree: n#12
at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:47)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:46)

I think the error has no relation with this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-09 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-55067304
  
had update the file according liancheng's comment. and test it locally


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-10 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-55225133
  
steps to verify this PR by SparkSQLCliDriver:
first,create two table:
run the following sql:



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-11 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-55278210
  
can this PR merged?:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-18 Thread baishuo
Github user baishuo commented on a diff in the pull request:

https://github.com/apache/spark/pull/2226#discussion_r17731044
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
 ---
@@ -522,6 +523,52 @@ class HiveQuerySuite extends HiveComparisonTest {
   case class LogEntry(filename: String, message: String)
   case class LogFile(name: String)
 
+  createQueryTest("dynamic_partition",
+"""
+  |DROP TABLE IF EXISTS dynamic_part_table;
+  |CREATE TABLE dynamic_part_table(intcol INT) PARTITIONED BY 
(partcol1 INT, partcol2 INT);
+  |
+  |SET hive.exec.dynamic.partition.mode=nonstrict;
+  |
+  |INSERT INTO TABLE dynamic_part_table PARTITION(partcol1, partcol2)
+  |SELECT 1, 1, 1 FROM src WHERE key=150;
+  |
+  |INSERT INTO TABLE dynamic_part_table PARTITION(partcol1, partcol2)
+  |SELECT 1, NULL, 1 FROM src WHERE key=150;
+  |
+  |INSERT INTO TABLE dynamic_part_table PARTITION(partcol1, partcol2)
+  |SELECT 1, 1, NULL FROM src WHERE key=150;
+  |
+  |INSERT INTO TABLe dynamic_part_table PARTITION(partcol1, partcol2)
+  |SELECT 1, NULL, NULL FROM src WHERE key=150;
+  |
+  |DROP TABLE IF EXISTS dynamic_part_table;
+""".stripMargin)
--- End diff --

to check data in correct partitions just meaning data in correct folder


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-09-22 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-56473456
  
thanks a lot to @liancheng :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add Dynamic Partition support...

2014-09-23 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-56616245
  
had remove "s from title @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add Dynamic Partition support...

2014-09-23 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-56621834
  
I think I should say thank you to @liancheng and @yhuai. During the 
communication with you, I  had learned a lot :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add Dynamic Partition support...

2014-09-24 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2226#issuecomment-56770301
  
hi @marmbrus ,would you please run the merge script again? :) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-08-13 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/1919

[SPARK-3007][SQL]Add "Dynamic Partition" support to Spark Sql hive

the detail please refer the comment of 
https://issues.apache.org/jira/browse/SPARK-3007

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1919.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1919


commit d3e206e1a2fadc271e365462bd93730e31a094eb
Author: baishuo(白硕) 
Date:   2014-08-12T17:27:54Z

Update HiveQl.scala

commit b22857a365925a428c41dd3e93d0da3613053071
Author: baishuo(白硕) 
Date:   2014-08-12T17:29:36Z

Update SparkHadoopWriter.scala

commit bade51d4726b8c55de83fef5c3e42c48f5af8f59
Author: baishuo(白硕) 
Date:   2014-08-12T17:31:01Z

Update InsertIntoHiveTable.scala

commit d211d330550260d93752349682e7c8447691a9e5
Author: baishuo(白硕) 
Date:   2014-08-12T17:53:04Z

Update InsertIntoHiveTable.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-08-13 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1919#issuecomment-52026271
  
I didnt have  add the related test since I dont know how to write it.  but 
I had test the function by SparkSQLCLIDriver


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-08-13 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1919#issuecomment-52141005
  
hi @marmbrus , when I study the HiveQuerySuite.scala, I found there is a 
important table : src,  but I didnt find where and how the table created, would 
please give more instruction?  thank you :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-08-18 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1919#issuecomment-52583642
  
thanks a lot @yhuai and @liancheng:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-08-19 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1919#issuecomment-52670830
  
Hi @marmbrus and @liancheng  I had made some modification and do the test 
with "sbt/sbt catalyst/test sql/test hive/test"  .  Please help me to check if 
it is proper when you have time . Thank you :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-08-19 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1919#issuecomment-52728496
  
here I try to express my design idea clearly: 
lets assume there is a table called table1,which has 2 columns:col1,col2, 
and two partitions: part1,  part2.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-08-19 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1919#issuecomment-52734543
  
I also curious about that.
I down the master branch,and check the folder 
sql/hive/src/test/resources/golden
I find that files begin with  dynamic_partition_skip_default*  or 
load_dyn_part* already exist. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-08-20 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1919#issuecomment-52758525
  
Here I try to explain my design idea(the code is mostly in 
InsertIntoHiveTable.scala) :
lets assume there is a table called table1,which has 2 columns:col1,col2, 
and two partitions: part1, part2.

ONE:
In case of just insert data to a static partition,I find when 
"saveAsHiveFile" finished, the data was wroten to a temporary location, then 
directory like: /tmp/hive-root/hive_/-ext-1,lets call it TMPLOCATION. 
And under TMPLOCATION, there is sub directory /part1=.../part2=... , all data 
was store under TMPLOCATION/part1=.../part2=... , then spark will call hive api 
"loadPartition" to move the files to 
{hivewarehouse}/{tablename}/part1=.../part2=... and update the metadata. then 
the whole progress is OK.

If we what to implement the "dynamic partiton function", we need to use 
hive api "loadDynamicPartitions" to move data and update metadata. But the 
requirement of directory formate for "loadDynamicPartitions" is a little 
difference to "loadPartition":

1: In case of one static partition and one dynamic partition (HQL like "
insert overwrite table table1 partition(part1=val1,part2) select a,b,c from 
..."), loadDynamicPartitions need the tmp data located at TMPLOCATION/part2=c1, 
TMPLOCATION/part2=c2 ..., And loadDynamicPartitions will move them to 
{hivewarehouse}/{tablename}/part1=val1/part2=c1, 
{hivewarehouse}/{tablename}/part1=val1/part2=c2 , and update the metadata. 
Note that in this case loadDynamicPartitions do note need the subdir like 
part1=val1 under TMPLOCATION

2: In case of zero static partition and 2 dynamic partition (HQL like "
insert overwrite table table1 partition(part1,part2) select a,b,x,c from 
..."), loadDynamicPartitions need the tmp data located at 
TMPLOCATION/part1=../part2=c1, TMPLOCATION/part1=../part2=c2 ..., And 
loadDynamicPartitions will move them to 
{hivewarehouse}/{tablename}/part1=../part2=...,

So if there have static partition in HQL determine how we create subdir 
under TMPLOCATION. That why the function "getDynamicPartDir" exist.

TWO:
where shall we call the "getDynamicPartDir"? must a location that we can 
get the values for dynamic partiton. so we call this function at "iter.map { 
row =>..." in the closure of "val rdd = childRdd.mapPartitions". when we get 
the row, we can get the values for dynamic partiton. after we get the 
dynamicPartPath by function getDynamicPartDir, we can pass it to next RDD by 
the output this RDD: serializer.serialize(outputData, standardOI) -> 
dynamicPartPath. (for the static partiton,dynamicPartPath is null)

when the next rdd (closure in writeToFile) get the data and 
dynamicPartPath, we can check if the dynamicPartPath equals null. if not null. 
we check if there is already a corresponding writer exist in writerMap which 
store all writer for each partition. if there is. we use this writer to write 
the record. that ensure the data belongs to same partition will be wroten to 
the same directory.

loadDynamicPartitions require there is no other files under TMPLOCATION 
except the subdir for dynamic partition. that why there are several "if 
(dynamicPartNum == 0)" in writeToFile


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3007][SQL]Add "Dynamic Partition" suppo...

2014-08-26 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/1919#issuecomment-53390280
  
Hi @marmbrus i had update the file relating with test. all test passed on 
my machine. Would you please help to verify this patch when you have time:)  I 
had write out the thinking of the code. thank you. 
@rxin @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3241][SQL] create NumberFormat instance...

2014-08-26 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/2157

[SPARK-3241][SQL] create NumberFormat instance by threadsafe way



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark patch-threadlocal

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2157.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2157


commit 5d05a01d7737ee86ed42cb004b01d0cf22d4d695
Author: baishuo 
Date:   2014-08-27T03:12:24Z

create NumberFormat instance by threadsafe way




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3241][SQL] create NumberFormat instance...

2014-08-27 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/2157#issuecomment-53544332
  
thank you @chenghao-intel .  I think I didnt express what I think clearly.  
why there is a threadlocal is to ensure there is one and only one NumberFormat 
instance in the same thread. othrewise, if open was called more than once, 
there maybe more than one instance  of NumberFormat 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update CommandUtils.scala

2014-03-16 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/157#issuecomment-37779718
  
Hey @pwendell   I think maybe the "-Xdebug -Xrunjdwp..." option can not 
located behind  the classpath? In my privious work, I always set it before "-cp"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update CommandUtils.scala

2014-03-16 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/157#issuecomment-37787235
  
@srowen thank you 
 @pwendell no problem


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update CommandUtils.scala

2014-03-17 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/157#issuecomment-37804560
  
@mateiz I  had update the monitoring.md but I dont know how to send a pull 
request with this file seperatly 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update CommandUtils.scala

2014-03-18 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/157#issuecomment-37954039
  
Oh,I see,please let me do that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update CommandUtils.scala

2014-03-21 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/157#issuecomment-38272451
  
please close this PR,thank you


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update CommandUtils.scala

2014-03-21 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/157#issuecomment-38272488
  
please close this PR,thank you @pwendell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update CommandUtils.scala

2014-03-21 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/157#issuecomment-38281673
  
Hi @pwendell @mateiz ,i'm a new user of github, could you please teach me 
how to undo the recently commit (I just want to make my master branche the same 
as spark:master)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Branch 0.9

2014-03-21 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/196#issuecomment-38288017
  
sorry I do a wrong commit please close it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update spark-daemon.sh

2014-03-21 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/197

Update spark-daemon.sh

Since the previous command is : cd "$SPARK_PREFIX" , so we can call  
spark-class  by "./bin/spark-class" instead of "$SPARK_PREFIX"/bin/spark-class

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark branch-0.9

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/197.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #197


commit a7c3da8d7dba299e4edb13a37a81766b9a2200df
Author: baishuo(白硕) 
Date:   2014-03-21T15:28:19Z

Update spark-daemon.sh

Since the previous command is : cd "$SPARK_PREFIX" , so we can call  
spark-class  by "./bin/spark-class" instead of "$SPARK_PREFIX"/bin/spark-class




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update spark-daemon.sh

2014-03-21 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/197#issuecomment-38290298
  
./bin/spark-class  or  bin/spark-class,  which is better?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Branch 0.9

2014-03-21 Thread baishuo
Github user baishuo closed the pull request at:

https://github.com/apache/spark/pull/196


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update CommandUtils.scala

2014-03-21 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/157#issuecomment-38292256
  
close it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update CommandUtils.scala

2014-03-21 Thread baishuo
Github user baishuo closed the pull request at:

https://github.com/apache/spark/pull/157


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update spark-daemon.sh

2014-03-21 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/197#issuecomment-38298607
  
or delete“  cd "$SPARK_PREFIX"   ”?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update slaves.sh

2014-03-26 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/238

Update slaves.sh

update the comment

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/238.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #238


commit 0f08299625bf6135365e8127cb6cfbca2162c909
Author: baishuo(白硕) 
Date:   2014-03-26T16:03:45Z

Update slaves.sh

update the comment




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update slaves.sh

2014-03-26 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/238#issuecomment-38713391
  
yeah ,get it. thank you@ppwendell.  close it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update slaves.sh

2014-03-26 Thread baishuo
Github user baishuo closed the pull request at:

https://github.com/apache/spark/pull/238


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update spark-daemon.sh

2014-03-26 Thread baishuo
Github user baishuo closed the pull request at:

https://github.com/apache/spark/pull/197


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update spark-daemon.sh

2014-03-26 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/197#issuecomment-38762615
  
close now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update spark-daemon.sh

2014-03-27 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/254

Update spark-daemon.sh

I think we do not need 'cd "$SPARK_PREFIX" ' to run the spark-class. Am I 
right?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark aaa

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/254.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #254


commit 6a41627c57e842833a7b0d5f4e9f46b4e58fa1e7
Author: baishuo(白硕) 
Date:   2014-03-27T13:36:48Z

Update spark-daemon.sh

I think we do not need 'cd "$SPARK_PREFIX" ' to run the spark-class. Am I 
right?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update GradientDescentSuite.scala

2014-03-27 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/256

Update GradientDescentSuite.scala

use more faster way to construct an Array

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark bbb

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/256.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #256


commit a0d82044d14ab0c20fc93fe25e14753a963a9170
Author: baishuo(白硕) 
Date:   2014-03-27T15:04:49Z

Update GradientDescentSuite.scala

use more faster way to construct an Array




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update WindowedDStream.scala

2014-04-11 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/390

Update WindowedDStream.scala

update the content of Exception when windowDuration is not multiple of 
parent.slideDuration

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark windowdstream

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/390.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #390


commit 533c96828cbc54ef7f8e061027bd31cb233b76be
Author: baishuo(白硕) 
Date:   2014-04-11T08:50:56Z

Update WindowedDStream.scala

update the content of Exception when windowDuration is not multiple of 
parent.slideDuration




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update WindowedDStream.scala

2014-04-12 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/390#issuecomment-40275299
  
thank you @pwendell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update WindowedDStream.scala

2014-04-12 Thread baishuo
Github user baishuo closed the pull request at:

https://github.com/apache/spark/pull/390


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update WindowedDStream.scala

2014-04-12 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/390#issuecomment-40299063
  
no problem @pwendell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update ReducedWindowedDStream.scala

2014-04-16 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/425

Update ReducedWindowedDStream.scala

change  _slideDuration  to   _windowDuration 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/425.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #425


commit 6f09ea1e6c2892a6f04a197931d4385a8c3cee2d
Author: baishuo(白硕) 
Date:   2014-04-16T09:42:09Z

Update ReducedWindowedDStream.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update KafkaWordCount.scala

2014-04-23 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/523

Update KafkaWordCount.scala

modify the required args number

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/523.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #523


commit 0368ba9a404cece382010d1020a698c29b20e964
Author: baishuo(白硕) 
Date:   2014-04-24T03:00:29Z

Update KafkaWordCount.scala

modify the required args number




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update KafkaWordCount.scala

2014-04-24 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/523#issuecomment-41361697
  
I think there need  at least 4 arguments,am i right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update GradientDescentSuite.scala

2014-04-29 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/256#issuecomment-41657234
  
no problem @mengxr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update GradientDescentSuite.scala

2014-04-29 Thread baishuo
GitHub user baishuo opened a pull request:

https://github.com/apache/spark/pull/588

Update GradientDescentSuite.scala

use more faster way to construct an array

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/baishuo/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/588.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #588


commit b666d27bbdde653c98eeb9b8c96ad10d6fd2a110
Author: baishuo(白硕) 
Date:   2014-04-29T13:03:24Z

Update GradientDescentSuite.scala

use more faster way to construct an array




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update GradientDescentSuite.scala

2014-04-29 Thread baishuo
Github user baishuo commented on the pull request:

https://github.com/apache/spark/pull/256#issuecomment-41673035
  
Hi @mengxr the new PR is https://github.com/apache/spark/pull/588   please 
go to see if it can merge, thank you .  
close this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >