[GitHub] spark pull request: [SPARK-2237][CORE]Add ZLIBCompressionCodec cod...
Github user YanjieGao closed the pull request at: https://github.com/apache/spark/pull/1121 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2237][CORE]Add ZLIBCompressionCodec cod...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1121#issuecomment-54283956 Ok,i thinks so ,may be some small third-party lib may cause uncertain problem , I will close this PR. If have some mature solution ï¼ i will send PR and make a contribution. Best regards! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1134#issuecomment-54283740 Hi marmbrus,I will close it. Best Regards --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin
Github user YanjieGao closed the pull request at: https://github.com/apache/spark/pull/1134 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2240][SQL]Spark SQL add LeftSemiBloomFi...
Github user YanjieGao closed the pull request at: https://github.com/apache/spark/pull/1127 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2240][SQL]Spark SQL add LeftSemiBloomFi...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1127#issuecomment-54283611 Hi marmbrus , Got it , if i have some other good idea i will try to communicate with you ,Thanks ,I will close it latter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2373]RDD add span function (split an RD...
Github user YanjieGao closed the pull request at: https://github.com/apache/spark/pull/1306 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2373]RDD add span function (split an RD...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1306#issuecomment-53863401 Ok ,Got it, I will close this PR ; --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2239][SQL]Spark SQL basicOperator add D...
Github user YanjieGao closed the pull request at: https://github.com/apache/spark/pull/1145 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2239][SQL]Spark SQL basicOperator add D...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1145#issuecomment-49954380 Got it ,Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1134#issuecomment-48752406 Hi I rewrite the code ,and resolve some former problem --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin
Github user YanjieGao commented on a diff in the pull request: https://github.com/apache/spark/pull/1134#discussion_r14811559 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala --- @@ -400,3 +401,73 @@ case class BroadcastNestedLoopJoin( streamedPlusMatches.flatMap(_._1), sqlContext.sparkContext.makeRDD(rightOuterMatches)) } } + +/** + * :: DeveloperApi :: + * In some case ,data skew happens.SkewJoin sample the table rdd to find the largest key, + * then make the largest key rows as a table rdd.The left rdd will be made leftSkewedtable + * rdd without the largest key and the maxkeyskewedtable rdd with the largest key. + * Then,join the two table with the righttable. + * Finally,union the two result rdd. + */ +@DeveloperApi +case class SkewJoinCartesianProduct( +left: SparkPlan, +right: SparkPlan, +condition: Option[Expression])(@transient sc: SparkContext) extends BinaryNode { + override def output = left.output ++ right.output + + @transient lazy val boundCondition = +InterpretedPredicate( + condition + .map(c => BindReferences.bindReference(c, left.output ++ right.output)) + .getOrElse(Literal(true))) + + def execute() = { + +val skewedTable = left.execute() +//This will later write as configuration +val sample = skewedTable.sample(false, 0.3, 9).collect() +val sortedSample = sample.sortWith((row1, row2) => row1.hashCode() > row2.hashCode()) --- End diff -- i want to use key to sort the row ,I think i need some better way to obtain the key .Do you have some better way to fetch the key? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1134#issuecomment-48707289 Hi , I also make a left semi join .I don't know is this join as a optimization as the left semi join or as a single join algorithm. I think the 1127 PR also has some optimization need to do .Do you think this 1127 PR has it value to be merged ?Thanks a lot. https://github.com/apache/spark/pull/1127 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1134#issuecomment-48707013 Thanks Michael , (1) We could make it as a user hint ,like hive does . set hive.optimize.skewjoin = true; set hive.skewjoin.key = skew_key_threshold ï¼default = 10ï¼ We could use set sparksql.optimize.skewjoin=true set sparksql.skewjoin.key=skew_key_threshold (2)We could use sample to found the relative num of the key and though skew_key_threshold which is user set can judge which key is over the threshold (3) toString will generate many singleton object . ,I will optimize the code in next step. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1150#issuecomment-48266319 Hi, Michael I also have a Skew Join pr need to be reviewed and need some suggestions .Could you help me review it ?I have test it can pass the testsuite .Thanks a lot https://github.com/apache/spark/pull/1134 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2373]RDD add span function (split an RD...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1306#issuecomment-48149122 This function is useful in some cases ,Such as when i do Skew Join in another PR,I need to split an RDD to two RDD,One has skew keys ,and the other is not . val (maxKeySkewedTable, mainSkewedTable) = skewedTable.span(row => { skewSideKeyGenerator(row).toString().equals(maxrowKey.toString()) }) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1134#issuecomment-48143209 Hi all, I rewrite most of the code and the testsuite can pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2373]RDD add span function (split an RD...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1306#issuecomment-48100374 Thanks ,I optimize the code so it only evaluates the function once .Other comments are on JIRA --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1150#issuecomment-48099853 Thanks a lot, I have reformat the code style. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2373]RDD add span function (split an RD...
GitHub user YanjieGao opened a pull request: https://github.com/apache/spark/pull/1306 [SPARK-2373]RDD add span function (split an RDD to two RDD based on user's function)] def span(p: T => Boolean): (RDD[T], RDD[T]) Splits this RDD into a prefix/suffix pair according to a predicate . returns a pair consisting of the longest prefix of this RDD whose elements all satisfy p, and the rest of this list. JIRA:https://issues.apache.org/jira/browse/SPARK-2373 You can merge this pull request into a Git repository by running: $ git pull https://github.com/YanjieGao/spark rdd_span Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1306.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1306 commit e5bff618f09b0b33968e4c12b360e3d30f2878f9 Author: Yanjie Gao Date: 2014-06-20T07:20:12Z Spark SQL basicOperator add Intersect operator Hi all, I want to submit a basic operator Intersect For example , in sql case select * from table1 intersect select * from table2 So ,i want use this operator support this function in Spark SQL This operator will return the the intersection of SparkPlan child table RDD . commit 469f099c510b20d0871c2a22927e65b48c968964 Author: Yanjie Gao Date: 2014-06-23T08:04:33Z Update basicOperators.scala commit 61e88e7db2c118023fe501a7381c51c3da7f3940 Author: Yanjie Gao Date: 2014-06-23T08:08:03Z Update SqlParser.scala commit d4ac5e559485e6f948100a7e6875831b7a7b46a4 Author: Yanjie Gao Date: 2014-06-23T08:10:11Z Update HiveQl.scala commit ac73e60ef80ca78b2bc63d0ecc45f4b2a963d13c Author: Yanjie Gao Date: 2014-06-23T08:11:45Z Update basicOperators.scala commit 790765d915e7325a7dfdb46780c37a5e7b0bdf31 Author: Yanjie Gao Date: 2014-06-23T08:14:05Z Update SparkStrategies.scala commit 4dd453e2bf0d85b6cbfdfe703be403b83858818c Author: Yanjie Gao Date: 2014-06-23T08:17:20Z Update SQLQuerySuite.scala commit e2b64be1a643d43748c90cfb341177f1157db15d Author: Yanjie Gao Date: 2014-06-24T03:14:45Z Update basicOperators.scala commit f1288b46bb031fd34ed8d0217bcb4144d720d880 Author: Yanjie Gao Date: 2014-06-27T08:54:03Z delete annotation commit 0b4983723d39488b8a2ce7f3e13f5bdb1d25ac83 Author: Yanjie Gao Date: 2014-06-27T08:56:02Z delete the annotation commit bdc4a05f46f8dfdee7442be0230901cb7d1ef864 Author: Yanjie Gao Date: 2014-06-27T08:56:28Z Update basicOperators.scala commit f7961f6b9f839d58f5c5b1caf9702cd1e688fee7 Author: Yanjie Gao Date: 2014-06-27T10:29:33Z update the line less than commit 5e374c754c471e2a3485a4fcb68ba26f3af5dfbd Author: YanjieGao <396154...@qq.com> Date: 2014-07-03T06:07:45Z resolve conflict in SparkStrategies and basicOperator commit a802ca88e1dda66e116af01520cadb965036c455 Author: YanjieGao <396154...@qq.com> Date: 2014-07-04T03:25:51Z Merge remote branch 'upstream/master' into patch-5 commit 0c7cca5ea9c3e68758675c493570be87b38d346a Author: YanjieGao <396154...@qq.com> Date: 2014-07-04T03:58:28Z modify format problem commit ea78f3397ce645f0680a59b814f8472db91c6adb Author: YanjieGao <396154...@qq.com> Date: 2014-07-04T10:27:34Z resolve conflict and add annotation on basicOperator and remove HiveQl commit 1cfbfe6593ef939182d99481384cb1adb5990ad2 Author: YanjieGao <396154...@qq.com> Date: 2014-07-04T10:43:54Z refomat some files commit b1a641cc1697da147ef47a8d1fb7a61b5f6c5990 Author: YanjieGao <396154...@qq.com> Date: 2014-07-05T07:31:07Z Merge remote branch 'upstream/master' into rdd_span commit 8c4eafe3dd91810eb63287bc7d59e9cf9569956b Author: YanjieGao <396154...@qq.com> Date: 2014-07-05T07:40:28Z RDD add span function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1150#issuecomment-48078447 Thanks ,I add three blank after that line ,Now there aren't red lines. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Branch 1.0 Add ZLIBCompressionCodec code
Github user YanjieGao closed the pull request at: https://github.com/apache/spark/pull/1115 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1150#issuecomment-48030332 Hi Michael,I have resolve the conflict . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...
Github user YanjieGao commented on a diff in the pull request: https://github.com/apache/spark/pull/1150#discussion_r14555293 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -274,7 +274,7 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { case Unions(unionChildren) => execution.Union(unionChildren.map(planLater))(sqlContext) :: Nil case logical.Except(left,right) => -execution.Except(planLater(left),planLater(right)) :: Nil --- End diff -- I don't know why this line always red --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...
Github user YanjieGao commented on a diff in the pull request: https://github.com/apache/spark/pull/1150#discussion_r14553981 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala --- @@ -205,3 +205,15 @@ object ExistingRdd { case class ExistingRdd(output: Seq[Attribute], rdd: RDD[Row]) extends LeafNode { override def execute() = rdd } + +/** + * :: DeveloperApi :: --- End diff -- Ok i will modify it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...
Github user YanjieGao commented on a diff in the pull request: https://github.com/apache/spark/pull/1150#discussion_r14553975 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -623,6 +623,8 @@ private[hive] object HiveQl { queries.reduceLeft(Union) case Token("TOK_UNION", left :: right :: Nil) => Union(nodeToPlan(left), nodeToPlan(right)) + +case Token("TOK_INTERSECT", left :: right :: Nil) => Intersect(nodeToPlan(left), nodeToPlan(right)) --- End diff -- Hive doesn't support ,I will remove this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-48026439 Hi Michael,I also have a similar pr #1150 [SPARK-2235][SQL]Spark SQL basicOperator add Intersect operator ,Could you help me to review it? Thanks ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-48005168 Hi all , This pr all test has passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-48001405 Thanks a lot all ,I have modify the code as your suggestion ,next time i will match the Spark coding style --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1134#issuecomment-47870392 Hi all. I have resolve the conflict. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2240][SQL]Spark SQL add LeftSemiBloomFi...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1127#issuecomment-47869475 Hi all ,I have resolve the conflict . I don't know if this pr has the value to be merged --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2237][CORE]Add ZLIBCompressionCodec cod...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1121#issuecomment-47869431 Hi all,I have resolve the conflict i don't know if this pr has value to be merged? Thanks a lot --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1150#issuecomment-47869343 Hi all, I have resolved the conflict . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-47864285 Thanks a lot, Michael,I have modify the code . Merge build use two hours .But I saw the console test log error. I don't know if the new code is the main cause or the code could be merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-47774796 Hi all,I have resolved the conflict and merged with the master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-47480949 Hi all, I have modify the files and update the code as your suggestiones.The build has triggered but it didn't merged . I don't know what's the main cause of didn't merge .Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-47329035 Thanks I have modify the line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-47303095 Hi all,What should i do next ! Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-47302963 Hi Michael, adrian-wang ,Thanks a lot ,I have update all the files as your suggestion! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1134#issuecomment-47173420 Hi All,I update 8 files like the pull add EXCEPT operator .But when i exec the test ,it exec case class CartesianProduct operator.I think there are some mistakes in my code .Can you help me? Thanks a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2236][SQL]SparkSQL add SkewJoin
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1134#issuecomment-46938049 Thanks a lot ,Chenghao . This code like a demo ,i think we could through improve sample phrase and use some strategy to judge the which key set are skew keys. we can through absolute rate or relative rate .What's your suggestions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-46890516 Thanks a lot,Michael .I have update the code .I don't know if all the files are right? Can these files be merged ?Thanks.a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2235][SQL]Spark SQL basicOperator add I...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1150#issuecomment-46817079 hi all ,I have finished the INTERSECT operator ,and has update the 6 files this operator needed . Now i need your help to review the code .Thanks a lot ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-46814099 Hi all, I have finished this Subtract Operator. These code can run and pass compile.These code need to be reviewed thanks a lot! (1)because there is a conflict name in SqlParser , so change the name to Except in that file (2)I try test suite the result is strange and i try some other test suite found the cause may the rdd.Subtract operator cause .I think maybe that operator has a bug . You can use this case checkAnswer( sql("SELECT * FROM lowerCaseData EXCEPT SELECT * FROM upperCaseData "), (1, "a") :: (2, "b") :: (3, "c") :: (4, "d") :: Nil) If you debug you will find that all rdd become the same row . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user YanjieGao commented on a diff in the pull request: https://github.com/apache/spark/pull/1151#discussion_r14060932 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -119,6 +119,7 @@ class SqlParser extends StandardTokenParsers with PackratParsers { protected val UNCACHE = Keyword("UNCACHE") protected val UNION = Keyword("UNION") protected val WHERE = Keyword("WHERE") + protected val SUBTRACT = Keyword("SUBTRACT") --- End diff -- Thanks, I have correct it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user YanjieGao commented on a diff in the pull request: https://github.com/apache/spark/pull/1151#discussion_r14060925 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -369,6 +369,17 @@ class SQLQuerySuite extends QueryTest { (3, null))) } + test("subtract") { +checkAnswer( + sql("SELECT * FROM lowerCaseData SUBTRACT SELECT * FROM upperCaseData "), + (1, "a") :: + (2, "b") :: + (3, "c") :: + 4, "d") :: Nil) --- End diff -- Thanks ,I have correct it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2234][SQL]Spark SQL basicOperators add ...
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-46800258 Hi marmbrus I update these files as your comment tips ,but i think i may make some mistakes in the code .Could you help me and give me some tips ?I will continue to work around it and debug it to make it better Thanks a lot ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark SQL add LeftSemiBloomFilterBroadcastJoin
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1127#issuecomment-46769416 Thanks a lot .My intellij setting is in the two image .I think I may have some error in settings . Can you help what' the inconformity with your ide. I want to be a right code sytle .Thanks ![image](https://cloud.githubusercontent.com/assets/5576848/3350382/2370ad04-f9a9-11e3-9625-b1bfd59d6580.png) ![image](https://cloud.githubusercontent.com/assets/5576848/3350383/3ccb3d0a-f9a9-11e3-8281-80de3b5e8b6b.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark SQL basicOperators add Except operator
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-46769120 Thanks a lot , It's very nice of you .I will work around it .And then add code in the other files .I have some problems about some syntax .I have sent a mail to you .Thanks a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Branch 1.0 Add ZLIBCompressionCodec code
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1115#issuecomment-46755267 Thanks a lot , I will do it as you said .I once submit it as I fork spark reposity on the web ,and I write the code and run it on intellij .Then i edit the scala file add the new code on the web page . Then commit it on the web page. I don't know i update code in this way is right or not ? Thanks ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Branch 1.0 Add ZLIBCompressionCodec code
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1115#issuecomment-46754558 Hi Srowen , markhamstra . I want to merge this to the master branch.Last time i make a mistake . I resubmit this patch in https://github.com/apache/spark/pull/1121 I don't know if this is right ?Can you give me some other suggestions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark SQL add LeftSemiBloomFilterBroadcastJoin
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1127#issuecomment-46754487 Hi Zongheng, I reformat the code .I don't know if that is ok. And i hope you can give me more suggestions . Thanks a lot --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark SQL basicOperators add Except operator
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1151#issuecomment-46754425 Hi Zongheng, I try it ,and try add code like other operator. I don't know if i want to add this except operator ,do i need to add code or modify code in other scala files ? Thanks a lot --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SparkSQL add SkewJoin
Github user YanjieGao commented on the pull request: https://github.com/apache/spark/pull/1134#issuecomment-46754360 Hi rxin,I reformat it . Can you give me some suggestions.I will try to make it better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark SQL basicOperators add Ecept operator
GitHub user YanjieGao opened a pull request: https://github.com/apache/spark/pull/1151 Spark SQL basicOperators add Ecept operator Hi all, I want to submit a Except operator in basicOperators.scala In SQL case.SQL support two table do except operator. select * from table1 except select * from table2 This operator support the substract function .Return an table with the elements from `this` that are not in `other`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/YanjieGao/spark patch-6 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1151.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1151 commit 4eb43ec21a6bf890f52c61dd1502c8a7893d3824 Author: Yanjie Gao Date: 2014-06-20T07:57:28Z Update basicOperators.scala Hi all, I want to submit a Except operator in basicOperators.scala In SQL case.SQL support two table do except operator. select * from table1 except select * from table2 This operator support the substract function .Return an table with the elements from `this` that are not in `other`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark SQL basicOperator add Intersect operator
GitHub user YanjieGao opened a pull request: https://github.com/apache/spark/pull/1150 Spark SQL basicOperator add Intersect operator Hi all, I want to submit a basic operator Intersect For example , in sql case select * from table1 intersect select * from table2 So ,i want use this operator support this function in Spark SQL This operator will return the the intersection of SparkPlan child table RDD . You can merge this pull request into a Git repository by running: $ git pull https://github.com/YanjieGao/spark patch-5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1150.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1150 commit e5bff618f09b0b33968e4c12b360e3d30f2878f9 Author: Yanjie Gao Date: 2014-06-20T07:20:12Z Spark SQL basicOperator add Intersect operator Hi all, I want to submit a basic operator Intersect For example , in sql case select * from table1 intersect select * from table2 So ,i want use this operator support this function in Spark SQL This operator will return the the intersection of SparkPlan child table RDD . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---