[GitHub] spark pull request: [SPARK-2237][CORE]Add ZLIBCompressionCodec cod...

2014-09-03 Thread YanjieGao
Github user YanjieGao closed the pull request at:

https://github.com/apache/spark/pull/1121


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2237][CORE]Add ZLIBCompressionCodec cod...

2014-09-03 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1121#issuecomment-54283956
  
Ok,i thinks so ,may be some small third-party lib may cause uncertain 
problem , I will close this PR. If have some mature solution , i will send PR 
and make a contribution. Best regards!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin

2014-09-03 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1134#issuecomment-54283740
  
Hi marmbrus,I will close it. Best Regards


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin

2014-09-03 Thread YanjieGao
Github user YanjieGao closed the pull request at:

https://github.com/apache/spark/pull/1134


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2240][SQL]Spark SQL add LeftSemiBloomFi...

2014-09-03 Thread YanjieGao
Github user YanjieGao closed the pull request at:

https://github.com/apache/spark/pull/1127


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2240][SQL]Spark SQL add LeftSemiBloomFi...

2014-09-03 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1127#issuecomment-54283611
  
Hi marmbrus , Got it , if i have some other good idea i will try to 
communicate with you ,Thanks ,I will close it latter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2373]RDD add span function (split an RD...

2014-08-29 Thread YanjieGao
Github user YanjieGao closed the pull request at:

https://github.com/apache/spark/pull/1306


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2373]RDD add span function (split an RD...

2014-08-29 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1306#issuecomment-53863401
  
Ok ,Got it, I will close this  PR ;


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2239][SQL]Spark SQL basicOperator add D...

2014-07-23 Thread YanjieGao
Github user YanjieGao closed the pull request at:

https://github.com/apache/spark/pull/1145


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2239][SQL]Spark SQL basicOperator add D...

2014-07-23 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1145#issuecomment-49954380
  
Got it ,Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin

2014-07-11 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1134#issuecomment-48752406
  
Hi I rewrite the code ,and resolve some former problem


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin

2014-07-11 Thread YanjieGao
Github user YanjieGao commented on a diff in the pull request:

https://github.com/apache/spark/pull/1134#discussion_r14811559
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala ---
@@ -400,3 +401,73 @@ case class BroadcastNestedLoopJoin(
   streamedPlusMatches.flatMap(_._1), 
sqlContext.sparkContext.makeRDD(rightOuterMatches))
   }
 }
+
+/**
+ * :: DeveloperApi ::
+ * In some case ,data skew happens.SkewJoin  sample the table rdd to find 
the largest key,
+ * then make the largest key rows as a table rdd.The left rdd will be made 
 leftSkewedtable
+ * rdd without the largest key and the maxkeyskewedtable rdd with the 
largest key.
+ *  Then,join the two table  with the righttable.
+ * Finally,union the two result rdd.
+ */
+@DeveloperApi
+case class SkewJoinCartesianProduct(
+left: SparkPlan,
+right: SparkPlan,
+condition: Option[Expression])(@transient sc: SparkContext) extends 
BinaryNode {
+  override def output = left.output ++ right.output
+
+  @transient lazy val boundCondition =
+InterpretedPredicate(
+  condition
+  .map(c => BindReferences.bindReference(c, left.output ++ 
right.output))
+  .getOrElse(Literal(true)))
+
+  def execute() = {
+
+val skewedTable = left.execute()
+//This will later write as configuration
+val sample = skewedTable.sample(false, 0.3, 9).collect()
+val sortedSample = sample.sortWith((row1, row2) => row1.hashCode() > 
row2.hashCode())
--- End diff --

i want to use key to sort the row ,I think i need some better way to obtain 
the key .Do you have some better way to fetch the key?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin

2014-07-11 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1134#issuecomment-48707289
  
Hi , I also make a left semi join .I don't know is this join as a 
optimization as  the left semi join or as a single join algorithm. I think  the 
1127 PR also has some optimization need to do .Do you think this 1127 PR has it 
value to be merged ?Thanks a lot. 
https://github.com/apache/spark/pull/1127


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin

2014-07-11 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1134#issuecomment-48707013
  
Thanks Michael ,
(1) We could make it as a user hint ,like hive does .
set hive.optimize.skewjoin = true; 
set hive.skewjoin.key = skew_key_threshold (default = 10)
We could use  set sparksql.optimize.skewjoin=true
set sparksql.skewjoin.key=skew_key_threshold
(2)We could use sample to found the relative num of the key and though 
skew_key_threshold which is user set can judge which key is over the threshold
(3) toString will generate many singleton object .
,I will optimize the code in next step.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...

2014-07-07 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1150#issuecomment-48266319
  
Hi, Michael I also have a Skew Join pr need to be reviewed and need some 
suggestions .Could you help me review it ?I have test it can pass the testsuite 
.Thanks a lot

https://github.com/apache/spark/pull/1134


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2373]RDD add span function (split an RD...

2014-07-07 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1306#issuecomment-48149122
  
This function is useful in some cases ,Such as  when i do Skew Join in 
another PR,I need to split an RDD to two RDD,One has skew keys ,and the other 
is not .
 val (maxKeySkewedTable, mainSkewedTable) = skewedTable.span(row => {
  skewSideKeyGenerator(row).toString().equals(maxrowKey.toString())
})


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin

2014-07-06 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1134#issuecomment-48143209
  
Hi all, I rewrite most of the code and the testsuite can pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2373]RDD add span function (split an RD...

2014-07-05 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1306#issuecomment-48100374
  
Thanks ,I optimize the code so it only evaluates the function once .Other 
comments are on JIRA


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...

2014-07-05 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1150#issuecomment-48099853
  
Thanks a lot, I have reformat the code style.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2373]RDD add span function (split an RD...

2014-07-05 Thread YanjieGao
GitHub user YanjieGao opened a pull request:

https://github.com/apache/spark/pull/1306

[SPARK-2373]RDD add span function (split an RDD to two RDD based on user's 
function)]

def span(p: T => Boolean): (RDD[T], RDD[T]) 
Splits this RDD into a prefix/suffix pair according to a predicate .
returns
a pair consisting of the longest prefix of this RDD whose elements all 
satisfy p, and the rest of this list.

JIRA:https://issues.apache.org/jira/browse/SPARK-2373

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/YanjieGao/spark rdd_span

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1306.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1306


commit e5bff618f09b0b33968e4c12b360e3d30f2878f9
Author: Yanjie Gao 
Date:   2014-06-20T07:20:12Z

Spark SQL basicOperator add Intersect operator

Hi all,
I want to submit a basic operator Intersect
For example , in sql case 
select * from table1
intersect
select * from table2
So ,i want use this operator support this function in Spark SQL  
This operator will return the  the intersection of SparkPlan child table 
RDD .

commit 469f099c510b20d0871c2a22927e65b48c968964
Author: Yanjie Gao 
Date:   2014-06-23T08:04:33Z

Update basicOperators.scala

commit 61e88e7db2c118023fe501a7381c51c3da7f3940
Author: Yanjie Gao 
Date:   2014-06-23T08:08:03Z

Update SqlParser.scala

commit d4ac5e559485e6f948100a7e6875831b7a7b46a4
Author: Yanjie Gao 
Date:   2014-06-23T08:10:11Z

Update HiveQl.scala

commit ac73e60ef80ca78b2bc63d0ecc45f4b2a963d13c
Author: Yanjie Gao 
Date:   2014-06-23T08:11:45Z

Update basicOperators.scala

commit 790765d915e7325a7dfdb46780c37a5e7b0bdf31
Author: Yanjie Gao 
Date:   2014-06-23T08:14:05Z

Update SparkStrategies.scala

commit 4dd453e2bf0d85b6cbfdfe703be403b83858818c
Author: Yanjie Gao 
Date:   2014-06-23T08:17:20Z

Update SQLQuerySuite.scala

commit e2b64be1a643d43748c90cfb341177f1157db15d
Author: Yanjie Gao 
Date:   2014-06-24T03:14:45Z

Update basicOperators.scala

commit f1288b46bb031fd34ed8d0217bcb4144d720d880
Author: Yanjie Gao 
Date:   2014-06-27T08:54:03Z

delete annotation

commit 0b4983723d39488b8a2ce7f3e13f5bdb1d25ac83
Author: Yanjie Gao 
Date:   2014-06-27T08:56:02Z

delete the annotation

commit bdc4a05f46f8dfdee7442be0230901cb7d1ef864
Author: Yanjie Gao 
Date:   2014-06-27T08:56:28Z

Update basicOperators.scala

commit f7961f6b9f839d58f5c5b1caf9702cd1e688fee7
Author: Yanjie Gao 
Date:   2014-06-27T10:29:33Z

update the line less than

commit 5e374c754c471e2a3485a4fcb68ba26f3af5dfbd
Author: YanjieGao <396154...@qq.com>
Date:   2014-07-03T06:07:45Z

resolve conflict in SparkStrategies and basicOperator

commit a802ca88e1dda66e116af01520cadb965036c455
Author: YanjieGao <396154...@qq.com>
Date:   2014-07-04T03:25:51Z

Merge remote branch 'upstream/master' into patch-5

commit 0c7cca5ea9c3e68758675c493570be87b38d346a
Author: YanjieGao <396154...@qq.com>
Date:   2014-07-04T03:58:28Z

modify format problem

commit ea78f3397ce645f0680a59b814f8472db91c6adb
Author: YanjieGao <396154...@qq.com>
Date:   2014-07-04T10:27:34Z

resolve conflict and add annotation on basicOperator and remove HiveQl

commit 1cfbfe6593ef939182d99481384cb1adb5990ad2
Author: YanjieGao <396154...@qq.com>
Date:   2014-07-04T10:43:54Z

refomat some files

commit b1a641cc1697da147ef47a8d1fb7a61b5f6c5990
Author: YanjieGao <396154...@qq.com>
Date:   2014-07-05T07:31:07Z

Merge remote branch 'upstream/master' into rdd_span

commit 8c4eafe3dd91810eb63287bc7d59e9cf9569956b
Author: YanjieGao <396154...@qq.com>
Date:   2014-07-05T07:40:28Z

RDD add span function




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...

2014-07-04 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1150#issuecomment-48078447
  
Thanks ,I add three blank after that line ,Now there aren't red lines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Branch 1.0 Add ZLIBCompressionCodec code

2014-07-04 Thread YanjieGao
Github user YanjieGao closed the pull request at:

https://github.com/apache/spark/pull/1115


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...

2014-07-04 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1150#issuecomment-48030332
  
Hi Michael,I have resolve the conflict .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...

2014-07-04 Thread YanjieGao
Github user YanjieGao commented on a diff in the pull request:

https://github.com/apache/spark/pull/1150#discussion_r14555293
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -274,7 +274,7 @@ private[sql] abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
   case Unions(unionChildren) =>
 execution.Union(unionChildren.map(planLater))(sqlContext) :: Nil
   case logical.Except(left,right) =>   
 
-execution.Except(planLater(left),planLater(right)) :: Nil   
--- End diff --

I don't know why this line always red 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...

2014-07-04 Thread YanjieGao
Github user YanjieGao commented on a diff in the pull request:

https://github.com/apache/spark/pull/1150#discussion_r14553981
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
@@ -205,3 +205,15 @@ object ExistingRdd {
 case class ExistingRdd(output: Seq[Attribute], rdd: RDD[Row]) extends 
LeafNode {
   override def execute() = rdd
 }
+
+/**
+ * :: DeveloperApi ::
--- End diff --

Ok i will modify it



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...

2014-07-04 Thread YanjieGao
Github user YanjieGao commented on a diff in the pull request:

https://github.com/apache/spark/pull/1150#discussion_r14553975
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -623,6 +623,8 @@ private[hive] object HiveQl {
   queries.reduceLeft(Union)
 
 case Token("TOK_UNION", left :: right :: Nil) => 
Union(nodeToPlan(left), nodeToPlan(right))
+
+case Token("TOK_INTERSECT", left :: right :: Nil) => 
Intersect(nodeToPlan(left), nodeToPlan(right))  
--- End diff --

Hive doesn't support  ,I will remove this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-07-04 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-48026439
  
Hi Michael,I also have a similar pr  #1150 [SPARK-2235][SQL]Spark SQL 
basicOperator add Intersect operator ,Could you help me to review it? Thanks !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-07-03 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-48005168
  
Hi all , This pr all test has passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-07-03 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-48001405
  
Thanks a lot  all ,I have modify the code as your suggestion ,next time i 
will  match the Spark coding style


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin

2014-07-02 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1134#issuecomment-47870392
  
Hi all. I have resolve the conflict.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2240][SQL]Spark SQL add LeftSemiBloomFi...

2014-07-02 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1127#issuecomment-47869475
  
Hi all ,I have resolve the conflict . I don't know if this pr has the value 
to be merged


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2237][CORE]Add ZLIBCompressionCodec cod...

2014-07-02 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1121#issuecomment-47869431
  
Hi all,I have resolve the conflict  i don't know if this pr has value to be 
merged? Thanks a lot


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...

2014-07-02 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1150#issuecomment-47869343
  
Hi all, I have resolved the conflict .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-07-02 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-47864285
  
Thanks a lot, Michael,I have modify the code . Merge build use two hours 
.But I saw the console test log error. I don't know if  the new code is the 
main cause  or the code could be merge? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-07-02 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-47774796
  
Hi all,I have resolved the conflict and merged with the master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-29 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-47480949
  
Hi all, I have modify the files and update the code as your 
suggestiones.The build has triggered but it didn't merged . I don't know what's 
the main cause of  didn't merge  .Thanks  a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-27 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-47329035
  
Thanks I have modify the line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-26 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-47303095
  
Hi all,What  should i do next ! Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-26 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-47302963
  
Hi Michael, adrian-wang ,Thanks a lot ,I have update all  the files  as 
your suggestion!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin

2014-06-25 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1134#issuecomment-47173420
  
Hi All,I update 8 files like the pull add EXCEPT operator .But when i exec 
the test ,it exec case class CartesianProduct operator.I think there are some 
mistakes in my code .Can you help me? Thanks a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin

2014-06-24 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1134#issuecomment-46938049
  
Thanks a lot ,Chenghao . This code like a demo  ,i think we could through 
improve sample phrase and use some strategy to judge the which key set  are 
skew keys. we can through absolute rate or  relative rate .What's your 
suggestions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-23 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-46890516
  
Thanks a lot,Michael .I have update the code .I don't know if  all the 
files are right? Can these files be merged ?Thanks.a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...

2014-06-23 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1150#issuecomment-46817079
  
hi all ,I have finished the INTERSECT operator ,and has update the 6 files  
this operator needed .
Now i need your help to review the code .Thanks a lot !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-23 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-46814099
  
Hi all, I have finished this Subtract Operator.
These code can run and pass compile.These code need to be reviewed  thanks 
a lot! 
(1)because there is a conflict name in SqlParser , so change the name to 
Except in that file
(2)I try test suite  the result is strange and i try some other test suite 
found the cause may 
the rdd.Subtract operator cause .I think maybe that operator has a bug .
You can use this case
checkAnswer(
  sql("SELECT * FROM lowerCaseData EXCEPT SELECT * FROM upperCaseData 
"),
  (1, "a") ::
  (2, "b") ::
  (3, "c") ::
  (4, "d") :: Nil)


If you debug you will find that all rdd become the same row .



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-22 Thread YanjieGao
Github user YanjieGao commented on a diff in the pull request:

https://github.com/apache/spark/pull/1151#discussion_r14060932
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
@@ -119,6 +119,7 @@ class SqlParser extends StandardTokenParsers with 
PackratParsers {
   protected val UNCACHE = Keyword("UNCACHE")
   protected val UNION = Keyword("UNION")
   protected val WHERE = Keyword("WHERE")
+  protected val SUBTRACT = Keyword("SUBTRACT")
--- End diff --

Thanks, I have correct it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-22 Thread YanjieGao
Github user YanjieGao commented on a diff in the pull request:

https://github.com/apache/spark/pull/1151#discussion_r14060925
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -369,6 +369,17 @@ class SQLQuerySuite extends QueryTest {
 (3, null)))
   }
 
+ test("subtract") {
+checkAnswer(
+  sql("SELECT * FROM lowerCaseData SUBTRACT SELECT * FROM 
upperCaseData "),
+  (1, "a") ::
+  (2, "b") ::
+  (3, "c") ::
+  4, "d") :: Nil)
--- End diff --

Thanks  ,I have correct it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...

2014-06-22 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-46800258
  
Hi marmbrus
I update these files as your comment tips ,but i think i may make some 
mistakes in the code  .Could you help me and give me some tips ?I will continue 
to work around it  and debug it  to make it better 
Thanks a lot !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark SQL add LeftSemiBloomFilterBroadcastJoin

2014-06-21 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1127#issuecomment-46769416
  
Thanks a lot  .My intellij setting is in the two image .I think I may have 
some error in settings .
Can you help what'  the inconformity  with your ide. I want  to be a right 
code sytle .Thanks

![image](https://cloud.githubusercontent.com/assets/5576848/3350382/2370ad04-f9a9-11e3-9625-b1bfd59d6580.png)


![image](https://cloud.githubusercontent.com/assets/5576848/3350383/3ccb3d0a-f9a9-11e3-8281-80de3b5e8b6b.png)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark SQL basicOperators add Except operator

2014-06-21 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-46769120
  
Thanks a lot , It's very nice  of you  .I will work around  it .And then  
add code in the other files .I have some problems about some syntax .I have 
sent a mail to you   .Thanks a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Branch 1.0 Add ZLIBCompressionCodec code

2014-06-21 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1115#issuecomment-46755267
  
Thanks a lot , I will do it as you said .I once submit it as  I fork spark 
reposity on the web ,and I write the code and run it on intellij .Then  i edit 
the scala file add the new code  on the web page . Then commit it  on the web 
page.  I don't know  i update code in this way is right or not ? Thanks !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Branch 1.0 Add ZLIBCompressionCodec code

2014-06-21 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1115#issuecomment-46754558
  
Hi Srowen  ,  markhamstra .
I want to merge this to the master branch.Last time i make a mistake .  
I  resubmit this patch in   https://github.com/apache/spark/pull/1121
I don't know if this is right ?Can you give me some other suggestions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark SQL add LeftSemiBloomFilterBroadcastJoin

2014-06-21 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1127#issuecomment-46754487
  
Hi  Zongheng, I reformat the code .I don't know if that is ok. And i  hope 
you can give me more suggestions . Thanks  a lot 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark SQL basicOperators add Except operator

2014-06-21 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1151#issuecomment-46754425
  
Hi Zongheng, I try it ,and try add code like other operator. I don't know 
if i want to add this except operator ,do i need to add code or modify code in 
other scala files  ? Thanks a lot 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SparkSQL add SkewJoin

2014-06-21 Thread YanjieGao
Github user YanjieGao commented on the pull request:

https://github.com/apache/spark/pull/1134#issuecomment-46754360
  
Hi rxin,I reformat it . Can you give  me  some  suggestions.I will try to 
make it better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark SQL basicOperators add Ecept operator

2014-06-20 Thread YanjieGao
GitHub user YanjieGao opened a pull request:

https://github.com/apache/spark/pull/1151

Spark SQL basicOperators  add Ecept operator

Hi all,
I want to submit a Except operator in basicOperators.scala
In SQL case.SQL support two table do except operator.
select * from table1
except
select * from table2 
This operator support the substract function .Return an table with the 
elements from `this` that are not in `other`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/YanjieGao/spark patch-6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1151.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1151


commit 4eb43ec21a6bf890f52c61dd1502c8a7893d3824
Author: Yanjie Gao 
Date:   2014-06-20T07:57:28Z

Update basicOperators.scala

Hi all,
I want to submit a Except operator in basicOperators.scala
In SQL case.SQL support two table do except operator.
select * from table1
except
select * from table2 
This operator support the substract function .Return an table with the 
elements from `this` that are not in `other`.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark SQL basicOperator add Intersect operator

2014-06-20 Thread YanjieGao
GitHub user YanjieGao opened a pull request:

https://github.com/apache/spark/pull/1150

Spark SQL basicOperator add Intersect operator

Hi all,
I want to submit a basic operator Intersect
For example , in sql case 
select * from table1
intersect
select * from table2
So ,i want use this operator support this function in Spark SQL  
This operator will return the  the intersection of SparkPlan child table 
RDD .

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/YanjieGao/spark patch-5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1150.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1150


commit e5bff618f09b0b33968e4c12b360e3d30f2878f9
Author: Yanjie Gao 
Date:   2014-06-20T07:20:12Z

Spark SQL basicOperator add Intersect operator

Hi all,
I want to submit a basic operator Intersect
For example , in sql case 
select * from table1
intersect
select * from table2
So ,i want use this operator support this function in Spark SQL  
This operator will return the  the intersection of SparkPlan child table 
RDD .




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---