[GitHub] spark pull request: [SPARK-5920][CORE]BufferedInputStream is added...
Github user ravipesala closed the pull request at: https://github.com/apache/spark/pull/4878 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5920][CORE]BufferedInputStream is added...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/4878#issuecomment-77407789 I am sorry, the comments are valid. I am closing this PR. Thank you for reviewing it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5920][CORE]BufferedInputStream is added...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/4878 [SPARK-5920][CORE]BufferedInputStream is added at required places BufferedInputStream and BufferedOutputStream is added at required places. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-5920 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4878.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4878 commit c7a739e5575098736247f1c42d07fb5fa110e22b Author: ravipesala ravi.pes...@gmail.com Date: 2015-03-03T18:39:42Z Added BufferedInputStream at required places --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226] [SQL] Add Exists support for wher...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/4812#issuecomment-76755122 @chenghao-intel Sorry for late reply. I think semantically it looks fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/3249#issuecomment-76770343 @chenghao-intel Thank you for reviewing it.I will go through your comments and fix it. And regarding ```not in``` case we can use ``` left outer join``` . I will try to add to same PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226] [SQL] Add Exists support for wher...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/4812#issuecomment-76454856 @chenghao-intel Thank you for your implementation, following are my observations Implementation seems simple but it comes with lot of limitations. The query like below ``` select C from R1 where exists (Select B from R2 where R1.X = R2.Y) ``` would be converted as below in case of your implementation I guess ``` select C from R1 left semi join R2 on R1.X = R2.Y ``` But it syntactically not correct. it supposed to be converted as follow. ``` select C from R1 left semi join (select B, R2.Y as sq1_col0 from R2) sq1 on R1.X = sq1.sq1_col0 ``` Both ```exists``` and ```in``` implementations should be similar. Just add ```exists``` support in parser would be enough and remaining implementation is almost similar. Not only the above case there are lot of other scenarios need to be taken care in subquery expressions. Please refer https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf. I am waiting to get my code merged so that I am planning to add all the remaining features on top of it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/3249#issuecomment-73280560 @marmbrus Please check whether it is ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/3249#issuecomment-70528755 @marmbrus Please review it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r22448850 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -414,6 +418,123 @@ class Analyzer(catalog: Catalog, Generate(g, join = false, outer = false, None, child) } } + + /** + * Transforms the query which has subquery expressions in where clause to join queries. + * Case 1 Uncorelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2) + * -- rewritten query + * Select C from R1 left semi join (select B as sqc0 from R2) subquery on R1.A = subquery.sqc0 + * + * Case 2 Corelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y) + * -- rewritten query + * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from R2) subquery + * on R1.X = subquery.sqc1 and R1.A = subquery.sqc0 + * + * Refer: https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf + */ + object SubQueryExpressions extends Rule[LogicalPlan] { + +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case p: LogicalPlan if !p.childrenResolved = p + case filter @ Filter(conditions, child) = +val subqueryExprs = new scala.collection.mutable.ArrayBuffer[In]() +val nonSubQueryConds = new scala.collection.mutable.ArrayBuffer[Expression]() +conditions.collect { + case s @ In(exp, Seq(SubqueryExpression(subquery))) = +subqueryExprs += s +} +val transformedConds = conditions.transform { + // Replace with dummy + case s @ In(exp,Seq(SubqueryExpression(subquery))) = +Literal(true) +} +if (subqueryExprs.size == 1) { + val subqueryExpr = subqueryExprs.remove(0) + createLeftSemiJoin( +child, +subqueryExpr.value, +subqueryExpr.list(0).asInstanceOf[SubqueryExpression].subquery, +transformedConds) +} else if (subqueryExprs.size 1) { + // Only one subquery expression is supported. + throw new TreeNodeException(filter, Only one SubQuery expression is supported.) +} else { + filter +} +} + +/** + * Create LeftSemi join with parent query to the subquery which is mentioned in 'IN' predicate + * And combine the subquery conditions and parent query conditions. + */ +def createLeftSemiJoin(left: LogicalPlan, +value: Expression, +subquery: LogicalPlan, +parentConds: Expression) : LogicalPlan = { + val (transformedPlan, subqueryConds) = transformAndGetConditions(value, subquery) + // Unify the parent query conditions and subquery conditions and add these as join conditions + val unifyConds = And(parentConds, subqueryConds) + Join(left, transformedPlan, LeftSemi, Some(unifyConds)) +} + +/** + * Transform the subquery LogicalPlan and add the expressions which are used as filters to the + * projection. And also return filter conditions used in subquery + */ +def transformAndGetConditions(value: Expression, + subquery: LogicalPlan): (LogicalPlan, Expression) = { + val expr = new scala.collection.mutable.ArrayBuffer[Expression]() + val transformedPlan = subquery transform { +case project @ Project(projectList, f @ Filter(condition, child)) = + // Don't support more than one item in select list of subquery + if(projectList.size 1) { +throw new TreeNodeException( +project, +SubQuery can contain only one item in Select List) + } + val resolvedChild = ResolveRelations(child) + // Add the expressions to the projections which are used as filters in subquery + val toBeAddedExprs = f.references.filter{a = +resolvedChild.resolve(a.name, resolver) != None !project.outputSet.contains(a)} + val nameToExprMap = collection.mutable.Map[String, Alias]() + // Create aliases for all projection expressions. + val witAliases = (projectList ++ toBeAddedExprs).zipWithIndex.map { +case (exp, index) = + nameToExprMap.put(exp.name, Alias(exp, ssqc$index)()) + Alias(exp, ssqc$index)() + } + // Replace the condition column names with alias names. + val transformedConds = condition.transform
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r22448848 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubqueryExpression.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan + +/** + * Evaluates whether `subquery` result contains `value`. + * For example : 'SELECT * FROM src a WHERE a.key in (SELECT b.key FROM src b)' + * @param subquery In the above example 'SELECT b.key FROM src b' is 'subquery' + */ +case class SubqueryExpression(subquery: LogicalPlan) extends Expression { + + type EvaluatedType = Any + def dataType = subquery.output.head.dataType + override def foldable = false + def nullable = true + override def toString = sSubqueryExpression($subquery) --- End diff -- Ok. I will change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/3249#issuecomment-68673618 Thank you for reviewing it. Fixed the review comments. And added the TODO for future expansion of complex queries. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r22448842 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -414,6 +418,123 @@ class Analyzer(catalog: Catalog, Generate(g, join = false, outer = false, None, child) } } + + /** + * Transforms the query which has subquery expressions in where clause to join queries. + * Case 1 Uncorelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2) + * -- rewritten query + * Select C from R1 left semi join (select B as sqc0 from R2) subquery on R1.A = subquery.sqc0 + * + * Case 2 Corelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y) + * -- rewritten query + * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from R2) subquery + * on R1.X = subquery.sqc1 and R1.A = subquery.sqc0 + * + * Refer: https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf + */ + object SubQueryExpressions extends Rule[LogicalPlan] { + +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case p: LogicalPlan if !p.childrenResolved = p + case filter @ Filter(conditions, child) = +val subqueryExprs = new scala.collection.mutable.ArrayBuffer[In]() +val nonSubQueryConds = new scala.collection.mutable.ArrayBuffer[Expression]() +conditions.collect { + case s @ In(exp, Seq(SubqueryExpression(subquery))) = +subqueryExprs += s +} +val transformedConds = conditions.transform { + // Replace with dummy + case s @ In(exp,Seq(SubqueryExpression(subquery))) = +Literal(true) +} +if (subqueryExprs.size == 1) { + val subqueryExpr = subqueryExprs.remove(0) + createLeftSemiJoin( +child, +subqueryExpr.value, +subqueryExpr.list(0).asInstanceOf[SubqueryExpression].subquery, +transformedConds) +} else if (subqueryExprs.size 1) { + // Only one subquery expression is supported. + throw new TreeNodeException(filter, Only one SubQuery expression is supported.) +} else { + filter +} --- End diff -- Thanks for code snippet. It is good I will add it like this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r22448876 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -414,6 +418,123 @@ class Analyzer(catalog: Catalog, Generate(g, join = false, outer = false, None, child) } } + + /** + * Transforms the query which has subquery expressions in where clause to join queries. + * Case 1 Uncorelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2) + * -- rewritten query + * Select C from R1 left semi join (select B as sqc0 from R2) subquery on R1.A = subquery.sqc0 + * + * Case 2 Corelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y) + * -- rewritten query + * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from R2) subquery + * on R1.X = subquery.sqc1 and R1.A = subquery.sqc0 + * + * Refer: https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf + */ + object SubQueryExpressions extends Rule[LogicalPlan] { + +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case p: LogicalPlan if !p.childrenResolved = p + case filter @ Filter(conditions, child) = +val subqueryExprs = new scala.collection.mutable.ArrayBuffer[In]() +val nonSubQueryConds = new scala.collection.mutable.ArrayBuffer[Expression]() +conditions.collect { + case s @ In(exp, Seq(SubqueryExpression(subquery))) = +subqueryExprs += s +} +val transformedConds = conditions.transform { + // Replace with dummy + case s @ In(exp,Seq(SubqueryExpression(subquery))) = +Literal(true) +} +if (subqueryExprs.size == 1) { + val subqueryExpr = subqueryExprs.remove(0) + createLeftSemiJoin( +child, +subqueryExpr.value, +subqueryExpr.list(0).asInstanceOf[SubqueryExpression].subquery, +transformedConds) +} else if (subqueryExprs.size 1) { + // Only one subquery expression is supported. + throw new TreeNodeException(filter, Only one SubQuery expression is supported.) +} else { + filter +} +} + +/** + * Create LeftSemi join with parent query to the subquery which is mentioned in 'IN' predicate + * And combine the subquery conditions and parent query conditions. + */ +def createLeftSemiJoin(left: LogicalPlan, +value: Expression, +subquery: LogicalPlan, +parentConds: Expression) : LogicalPlan = { + val (transformedPlan, subqueryConds) = transformAndGetConditions(value, subquery) + // Unify the parent query conditions and subquery conditions and add these as join conditions + val unifyConds = And(parentConds, subqueryConds) + Join(left, transformedPlan, LeftSemi, Some(unifyConds)) +} + +/** + * Transform the subquery LogicalPlan and add the expressions which are used as filters to the + * projection. And also return filter conditions used in subquery + */ +def transformAndGetConditions(value: Expression, + subquery: LogicalPlan): (LogicalPlan, Expression) = { + val expr = new scala.collection.mutable.ArrayBuffer[Expression]() + val transformedPlan = subquery transform { +case project @ Project(projectList, f @ Filter(condition, child)) = --- End diff -- Yes , this type of queries cannot be evaluated in present code. May be we can expand it in future. I will add the TODO. And even hive also does not seems work this type of queries. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r22148838 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool protected def containsStar(exprs: Seq[Expression]): Boolean = exprs.collect { case _: Star = true }.nonEmpty } + + /** + * Transforms the query which has subquery expressions in where clause to join queries. + * Case 1 Uncorelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2) + * -- rewritten query + * Select C from R1 left semi join (select B as sqc0 from R2) subquery on R1.A = subquery.sqc0 + * + * Case 2 Corelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y) + * -- rewritten query + * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from R2) subquery + * on R1.X = subquery.sqc1 and R1.A = subquery.sqc0 + * + * Refer: https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf + */ + object SubQueryExpressions extends Rule[LogicalPlan] { + +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case filter @ Filter(conditions, child) = +val subqueryExprs = new scala.collection.mutable.ArrayBuffer[SubqueryExpression]() +val nonSubQueryConds = new scala.collection.mutable.ArrayBuffer[Expression]() +val transformedConds = conditions.transform{ --- End diff -- Ok. Done in two steps. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r22148840 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool protected def containsStar(exprs: Seq[Expression]): Boolean = exprs.collect { case _: Star = true }.nonEmpty } + + /** + * Transforms the query which has subquery expressions in where clause to join queries. + * Case 1 Uncorelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2) + * -- rewritten query + * Select C from R1 left semi join (select B as sqc0 from R2) subquery on R1.A = subquery.sqc0 + * + * Case 2 Corelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y) + * -- rewritten query + * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from R2) subquery + * on R1.X = subquery.sqc1 and R1.A = subquery.sqc0 + * + * Refer: https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf + */ + object SubQueryExpressions extends Rule[LogicalPlan] { + +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case filter @ Filter(conditions, child) = +val subqueryExprs = new scala.collection.mutable.ArrayBuffer[SubqueryExpression]() +val nonSubQueryConds = new scala.collection.mutable.ArrayBuffer[Expression]() +val transformedConds = conditions.transform{ + // Replace with dummy + case s @ SubqueryExpression(exp,subquery) = +subqueryExprs += s +Literal(true) +} +if (subqueryExprs.size == 1) { + val subqueryExpr = subqueryExprs.remove(0) + createLeftSemiJoin( +child, subqueryExpr.value, +subqueryExpr.subquery, transformedConds) +} else if (subqueryExprs.size 1) { + // Only one subquery expression is supported. + throw new TreeNodeException(filter, Only 1 SubQuery expression is supported.) --- End diff -- Ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r22148841 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool protected def containsStar(exprs: Seq[Expression]): Boolean = exprs.collect { case _: Star = true }.nonEmpty } + + /** + * Transforms the query which has subquery expressions in where clause to join queries. + * Case 1 Uncorelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2) + * -- rewritten query + * Select C from R1 left semi join (select B as sqc0 from R2) subquery on R1.A = subquery.sqc0 + * + * Case 2 Corelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y) + * -- rewritten query + * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from R2) subquery + * on R1.X = subquery.sqc1 and R1.A = subquery.sqc0 + * + * Refer: https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf + */ + object SubQueryExpressions extends Rule[LogicalPlan] { + +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case filter @ Filter(conditions, child) = +val subqueryExprs = new scala.collection.mutable.ArrayBuffer[SubqueryExpression]() +val nonSubQueryConds = new scala.collection.mutable.ArrayBuffer[Expression]() +val transformedConds = conditions.transform{ + // Replace with dummy + case s @ SubqueryExpression(exp,subquery) = +subqueryExprs += s +Literal(true) +} +if (subqueryExprs.size == 1) { + val subqueryExpr = subqueryExprs.remove(0) + createLeftSemiJoin( +child, subqueryExpr.value, +subqueryExpr.subquery, transformedConds) --- End diff -- Ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r22148845 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool protected def containsStar(exprs: Seq[Expression]): Boolean = exprs.collect { case _: Star = true }.nonEmpty } + + /** + * Transforms the query which has subquery expressions in where clause to join queries. + * Case 1 Uncorelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2) + * -- rewritten query + * Select C from R1 left semi join (select B as sqc0 from R2) subquery on R1.A = subquery.sqc0 + * + * Case 2 Corelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y) + * -- rewritten query + * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from R2) subquery + * on R1.X = subquery.sqc1 and R1.A = subquery.sqc0 + * + * Refer: https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf + */ + object SubQueryExpressions extends Rule[LogicalPlan] { + +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case filter @ Filter(conditions, child) = +val subqueryExprs = new scala.collection.mutable.ArrayBuffer[SubqueryExpression]() +val nonSubQueryConds = new scala.collection.mutable.ArrayBuffer[Expression]() +val transformedConds = conditions.transform{ + // Replace with dummy + case s @ SubqueryExpression(exp,subquery) = +subqueryExprs += s +Literal(true) +} +if (subqueryExprs.size == 1) { + val subqueryExpr = subqueryExprs.remove(0) + createLeftSemiJoin( +child, subqueryExpr.value, +subqueryExpr.subquery, transformedConds) +} else if (subqueryExprs.size 1) { + // Only one subquery expression is supported. + throw new TreeNodeException(filter, Only 1 SubQuery expression is supported.) +} else { + filter +} +} + +/** + * Create LeftSemi join with parent query to the subquery which is mentioned in 'IN' predicate + * And combine the subquery conditions and parent query conditions. + */ +def createLeftSemiJoin(left: LogicalPlan, +value: Expression, subquery: LogicalPlan, +parentConds: Expression) : LogicalPlan = { + val (transformedPlan, subqueryConds) = transformAndGetConditions( + value, subquery) --- End diff -- Ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r22148859 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool protected def containsStar(exprs: Seq[Expression]): Boolean = exprs.collect { case _: Star = true }.nonEmpty } + + /** + * Transforms the query which has subquery expressions in where clause to join queries. + * Case 1 Uncorelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2) + * -- rewritten query + * Select C from R1 left semi join (select B as sqc0 from R2) subquery on R1.A = subquery.sqc0 + * + * Case 2 Corelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y) + * -- rewritten query + * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from R2) subquery + * on R1.X = subquery.sqc1 and R1.A = subquery.sqc0 + * + * Refer: https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf + */ + object SubQueryExpressions extends Rule[LogicalPlan] { + +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case filter @ Filter(conditions, child) = +val subqueryExprs = new scala.collection.mutable.ArrayBuffer[SubqueryExpression]() +val nonSubQueryConds = new scala.collection.mutable.ArrayBuffer[Expression]() +val transformedConds = conditions.transform{ + // Replace with dummy + case s @ SubqueryExpression(exp,subquery) = +subqueryExprs += s +Literal(true) +} +if (subqueryExprs.size == 1) { + val subqueryExpr = subqueryExprs.remove(0) + createLeftSemiJoin( +child, subqueryExpr.value, +subqueryExpr.subquery, transformedConds) +} else if (subqueryExprs.size 1) { + // Only one subquery expression is supported. + throw new TreeNodeException(filter, Only 1 SubQuery expression is supported.) +} else { + filter +} +} + +/** + * Create LeftSemi join with parent query to the subquery which is mentioned in 'IN' predicate + * And combine the subquery conditions and parent query conditions. + */ +def createLeftSemiJoin(left: LogicalPlan, +value: Expression, subquery: LogicalPlan, +parentConds: Expression) : LogicalPlan = { + val (transformedPlan, subqueryConds) = transformAndGetConditions( + value, subquery) + // Unify the parent query conditions and subquery conditions and add these as join conditions + val unifyConds = And(parentConds, subqueryConds) + Join(left, transformedPlan, LeftSemi, Some(unifyConds)) +} + +/** + * Transform the subquery LogicalPlan and add the expressions which are used as filters to the + * projection. And also return filter conditions used in subquery + */ +def transformAndGetConditions(value: Expression, + subquery: LogicalPlan): (LogicalPlan, Expression) = { + val expr = new scala.collection.mutable.ArrayBuffer[Expression]() + val transformedPlan = subquery transform { +case project @ Project(projectList, f @ Filter(condition, child)) = + // Don't support more than 1 item in select list of subquery + if(projectList.size 1) { +throw new TreeNodeException(project, SubQuery can contain only 1 item in Select List) + } + val resolvedChild = ResolveRelations(child) --- End diff -- Here guarding may not work because SubqueryExpression does not resolve with main query so I guess we need to resolve it on the need basis. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r2214 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool protected def containsStar(exprs: Seq[Expression]): Boolean = exprs.collect { case _: Star = true }.nonEmpty } + + /** + * Transforms the query which has subquery expressions in where clause to join queries. + * Case 1 Uncorelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2) + * -- rewritten query + * Select C from R1 left semi join (select B as sqc0 from R2) subquery on R1.A = subquery.sqc0 + * + * Case 2 Corelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y) + * -- rewritten query + * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from R2) subquery + * on R1.X = subquery.sqc1 and R1.A = subquery.sqc0 + * + * Refer: https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf + */ + object SubQueryExpressions extends Rule[LogicalPlan] { + +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case filter @ Filter(conditions, child) = +val subqueryExprs = new scala.collection.mutable.ArrayBuffer[SubqueryExpression]() +val nonSubQueryConds = new scala.collection.mutable.ArrayBuffer[Expression]() +val transformedConds = conditions.transform{ + // Replace with dummy + case s @ SubqueryExpression(exp,subquery) = +subqueryExprs += s +Literal(true) +} +if (subqueryExprs.size == 1) { + val subqueryExpr = subqueryExprs.remove(0) + createLeftSemiJoin( +child, subqueryExpr.value, +subqueryExpr.subquery, transformedConds) +} else if (subqueryExprs.size 1) { + // Only one subquery expression is supported. + throw new TreeNodeException(filter, Only 1 SubQuery expression is supported.) +} else { + filter +} +} + +/** + * Create LeftSemi join with parent query to the subquery which is mentioned in 'IN' predicate + * And combine the subquery conditions and parent query conditions. + */ +def createLeftSemiJoin(left: LogicalPlan, +value: Expression, subquery: LogicalPlan, +parentConds: Expression) : LogicalPlan = { + val (transformedPlan, subqueryConds) = transformAndGetConditions( + value, subquery) + // Unify the parent query conditions and subquery conditions and add these as join conditions + val unifyConds = And(parentConds, subqueryConds) + Join(left, transformedPlan, LeftSemi, Some(unifyConds)) +} + +/** + * Transform the subquery LogicalPlan and add the expressions which are used as filters to the + * projection. And also return filter conditions used in subquery + */ +def transformAndGetConditions(value: Expression, + subquery: LogicalPlan): (LogicalPlan, Expression) = { + val expr = new scala.collection.mutable.ArrayBuffer[Expression]() + val transformedPlan = subquery transform { +case project @ Project(projectList, f @ Filter(condition, child)) = + // Don't support more than 1 item in select list of subquery + if(projectList.size 1) { +throw new TreeNodeException(project, SubQuery can contain only 1 item in Select List) + } + val resolvedChild = ResolveRelations(child) + // Add the expressions to the projections which are used as filters in subquery + val toBeAddedExprs = f.references.filter( + a=resolvedChild.resolve(a.name, resolver) != None !projectList.contains(a)) --- End diff -- Here guarding may not work because SubqueryExpression does not resolve with main query so I guess we need to resolve it on the need basis. And I used ```project.outputSet``` to filter --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r22148896 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool protected def containsStar(exprs: Seq[Expression]): Boolean = exprs.collect { case _: Star = true }.nonEmpty } + + /** + * Transforms the query which has subquery expressions in where clause to join queries. + * Case 1 Uncorelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2) + * -- rewritten query + * Select C from R1 left semi join (select B as sqc0 from R2) subquery on R1.A = subquery.sqc0 + * + * Case 2 Corelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y) + * -- rewritten query + * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from R2) subquery + * on R1.X = subquery.sqc1 and R1.A = subquery.sqc0 + * + * Refer: https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf + */ + object SubQueryExpressions extends Rule[LogicalPlan] { + +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case filter @ Filter(conditions, child) = +val subqueryExprs = new scala.collection.mutable.ArrayBuffer[SubqueryExpression]() +val nonSubQueryConds = new scala.collection.mutable.ArrayBuffer[Expression]() +val transformedConds = conditions.transform{ + // Replace with dummy + case s @ SubqueryExpression(exp,subquery) = +subqueryExprs += s +Literal(true) +} +if (subqueryExprs.size == 1) { + val subqueryExpr = subqueryExprs.remove(0) + createLeftSemiJoin( +child, subqueryExpr.value, +subqueryExpr.subquery, transformedConds) +} else if (subqueryExprs.size 1) { + // Only one subquery expression is supported. + throw new TreeNodeException(filter, Only 1 SubQuery expression is supported.) +} else { + filter +} +} + +/** + * Create LeftSemi join with parent query to the subquery which is mentioned in 'IN' predicate + * And combine the subquery conditions and parent query conditions. + */ +def createLeftSemiJoin(left: LogicalPlan, +value: Expression, subquery: LogicalPlan, +parentConds: Expression) : LogicalPlan = { + val (transformedPlan, subqueryConds) = transformAndGetConditions( + value, subquery) + // Unify the parent query conditions and subquery conditions and add these as join conditions + val unifyConds = And(parentConds, subqueryConds) + Join(left, transformedPlan, LeftSemi, Some(unifyConds)) +} + +/** + * Transform the subquery LogicalPlan and add the expressions which are used as filters to the + * projection. And also return filter conditions used in subquery + */ +def transformAndGetConditions(value: Expression, + subquery: LogicalPlan): (LogicalPlan, Expression) = { + val expr = new scala.collection.mutable.ArrayBuffer[Expression]() + val transformedPlan = subquery transform { +case project @ Project(projectList, f @ Filter(condition, child)) = + // Don't support more than 1 item in select list of subquery + if(projectList.size 1) { +throw new TreeNodeException(project, SubQuery can contain only 1 item in Select List) + } + val resolvedChild = ResolveRelations(child) + // Add the expressions to the projections which are used as filters in subquery + val toBeAddedExprs = f.references.filter( + a=resolvedChild.resolve(a.name, resolver) != None !projectList.contains(a)) + val cache = collection.mutable.Map[String, String]() + // Create aliases for all projection expressions. + val witAliases = (projectList ++ toBeAddedExprs).zipWithIndex.map { +case (exp, index) = + cache.put(exp.name, ssqc$index) + Alias(exp, ssqc$index)() + } + // Replace the condition column names with alias names. + val transformedConds = condition.transform{ +case a: Attribute if resolvedChild.resolve(a.name, resolver) != None = + UnresolvedAttribute(subquery. + cache.get(a.name).get) --- End diff -- Ok . Added ```Alias``` in all places and removed
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r22148903 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubqueryExpression.scala --- @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan + +/** + * Evaluates whether `subquery` result contains `value`. + * For example : 'SELECT * FROM src a WHERE a.key in (SELECT b.key FROM src b)' + * @param value In the above example 'a.key' is 'value' + * @param subquery In the above example 'SELECT b.key FROM src b' is 'subquery' + */ +case class SubqueryExpression(value: Expression, subquery: LogicalPlan) extends Expression { --- End diff -- Ok. Added like ```In('a.key, SubqueryExpression(...))``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r22148913 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubqueryExpression.scala --- @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan + +/** + * Evaluates whether `subquery` result contains `value`. + * For example : 'SELECT * FROM src a WHERE a.key in (SELECT b.key FROM src b)' + * @param value In the above example 'a.key' is 'value' + * @param subquery In the above example 'SELECT b.key FROM src b' is 'subquery' + */ +case class SubqueryExpression(value: Expression, subquery: LogicalPlan) extends Expression { + + type EvaluatedType = Any + def dataType = value.dataType + override def foldable = value.foldable + def nullable = value.nullable + override def toString = sSubqueryExpression($value, $subquery) + override lazy val resolved = childrenResolved --- End diff -- Yes, it is always unresolved. It will only be converted not executed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r22148917 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubqueryExpression.scala --- @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan + +/** + * Evaluates whether `subquery` result contains `value`. + * For example : 'SELECT * FROM src a WHERE a.key in (SELECT b.key FROM src b)' + * @param value In the above example 'a.key' is 'value' + * @param subquery In the above example 'SELECT b.key FROM src b' is 'subquery' + */ +case class SubqueryExpression(value: Expression, subquery: LogicalPlan) extends Expression { + + type EvaluatedType = Any + def dataType = value.dataType + override def foldable = value.foldable + def nullable = value.nullable --- End diff -- Yes. I guess it is nullable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r22149004 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool protected def containsStar(exprs: Seq[Expression]): Boolean = exprs.collect { case _: Star = true }.nonEmpty } + + /** + * Transforms the query which has subquery expressions in where clause to join queries. + * Case 1 Uncorelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2) + * -- rewritten query + * Select C from R1 left semi join (select B as sqc0 from R2) subquery on R1.A = subquery.sqc0 + * + * Case 2 Corelated queries + * -- original query + * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y) + * -- rewritten query + * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from R2) subquery + * on R1.X = subquery.sqc1 and R1.A = subquery.sqc0 + * + * Refer: https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf + */ + object SubQueryExpressions extends Rule[LogicalPlan] { + +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case filter @ Filter(conditions, child) = +val subqueryExprs = new scala.collection.mutable.ArrayBuffer[SubqueryExpression]() +val nonSubQueryConds = new scala.collection.mutable.ArrayBuffer[Expression]() +val transformedConds = conditions.transform{ + // Replace with dummy + case s @ SubqueryExpression(exp,subquery) = +subqueryExprs += s +Literal(true) +} +if (subqueryExprs.size == 1) { + val subqueryExpr = subqueryExprs.remove(0) + createLeftSemiJoin( +child, subqueryExpr.value, +subqueryExpr.subquery, transformedConds) +} else if (subqueryExprs.size 1) { + // Only one subquery expression is supported. + throw new TreeNodeException(filter, Only 1 SubQuery expression is supported.) +} else { + filter +} +} + +/** + * Create LeftSemi join with parent query to the subquery which is mentioned in 'IN' predicate + * And combine the subquery conditions and parent query conditions. + */ +def createLeftSemiJoin(left: LogicalPlan, +value: Expression, subquery: LogicalPlan, +parentConds: Expression) : LogicalPlan = { + val (transformedPlan, subqueryConds) = transformAndGetConditions( + value, subquery) + // Unify the parent query conditions and subquery conditions and add these as join conditions + val unifyConds = And(parentConds, subqueryConds) + Join(left, transformedPlan, LeftSemi, Some(unifyConds)) +} + +/** + * Transform the subquery LogicalPlan and add the expressions which are used as filters to the + * projection. And also return filter conditions used in subquery + */ +def transformAndGetConditions(value: Expression, + subquery: LogicalPlan): (LogicalPlan, Expression) = { + val expr = new scala.collection.mutable.ArrayBuffer[Expression]() + val transformedPlan = subquery transform { +case project @ Project(projectList, f @ Filter(condition, child)) = + // Don't support more than 1 item in select list of subquery + if(projectList.size 1) { +throw new TreeNodeException(project, SubQuery can contain only 1 item in Select List) + } + val resolvedChild = ResolveRelations(child) + // Add the expressions to the projections which are used as filters in subquery + val toBeAddedExprs = f.references.filter( + a=resolvedChild.resolve(a.name, resolver) != None !projectList.contains(a)) + val cache = collection.mutable.Map[String, String]() --- End diff -- I may not able to use ```AttributeMap``` without resolving the subquery completly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/3249#issuecomment-67778115 Thank you for reviewing it. I have worked on review comments.Please review it. I guess the ```SubqueryExpression``` may not be resolved along with main query and also we may not able to resolve it separately as it may contain the main query references like ```select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y) ```. Here R1.X is used inside subquery. So I guess we can resolve it on the need basis. Please comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2554][SQL] Supporting SumDistinct parti...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/3348#issuecomment-65766741 I have Rebased with master,Please review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/3249#issuecomment-65393775 Rebased with master. And fixed comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/3249#discussion_r21225534 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubqueryExpression.scala --- @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan + +case class SubqueryExpression(exp: Expression, child: LogicalPlan) extends Expression { --- End diff -- Thank you for your comments. Here ```exp``` is like predicate value. For example ```SELECT * FROM src a WHERE a.key in (SELECT b.key FROM src b)``` . In this ```exp``` is ```a.key``` and ```child``` is subquery. Now I have updated the names of them and added the documentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4513][SQL] Support relational operator ...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/3387#issuecomment-65186523 It was already merged with master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4513][SQL] Support relational operator ...
Github user ravipesala closed the pull request at: https://github.com/apache/spark/pull/3387 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4658][SQL] Code documentation issue in ...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/3516 [SPARK-4658][SQL] Code documentation issue in DDL of datasource API You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark ddl_doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3516.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3516 commit d2238cda19eebe5cbdf153e332ef52437cca015a Author: ravipesala ravindra.pes...@huawei.com Date: 2014-11-30T03:14:21Z Corrected documentation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4648][SQL] Support COALESCE function in...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/3510 [SPARK-4648][SQL] Support COALESCE function in Spark SQL and Hive QL Currently HiveQL uses Hive UDF function for Coalesce. Usually using hive udfs are memory intensive. Since Coalesce function is already available in Spark , we can make use of it. And also support Coalesce function in Spar SQL You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark Coalesce Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3510.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3510 commit bbdeebe645d0f045d51b7e9e9adc379fbd7cdc55 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-11-28T14:04:32Z Support COALESCE function in Spark SQL and Hive QL --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4650][SQL] Supporting multi column supp...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/3511 [SPARK-4650][SQL] Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark countdistinct Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3511.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3511 commit 070e12a461093cb534e97430abebd78e8ee83275 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-11-28T19:42:10Z Supporting multi column support in count(distinct c1,c2..) in Spark SQL --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4648][SQL] Support COALESCE function in...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/3510#issuecomment-64925533 retest please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4513][SQL] Support relational operator ...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/3387 [SPARK-4513][SQL] Support relational operator '=' in Spark SQL The relational operator '=' is not working in Spark SQL. Same works in Spark HiveQL You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark = Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3387.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3387 commit 7198e90fd6458bc44c0c40762c0d493d240e5e69 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-11-20T19:04:04Z Supporting relational operator '=' in Spark SQL --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2554][SQL] Supporting SumDistinct parti...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/3348 [SPARK-2554][SQL] Supporting SumDistinct partial aggregation Adding support to the partial aggregation of SumDistinct You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-2554 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3348.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3348 commit 4a31ca75dc44ff239a829ef1ba4a19a63042ce92 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-11-18T21:01:32Z Supporting SumDistinct partial aggregation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/3249 [SPARK-4226][SQL] SparkSQL - Add support for subqueries in predicates('in' clause) This PR supports subqueries in preicates 'in' clause. The queries will be transformed to the LeftSemi join as mentioned below. Case 1 Uncorelated queries -- original query select C from R1 where R1.A in (Select B from R2) -- rewritten query Select C from R1 left semijoin R2 on R1.A = R2.B Case 2 Corelated queries -- original query select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y) -- rewritten query select C from R1 left semi join (select B, R2.Y as sq1_col0 from R2) sq1 on R1.X = sq1.sq1_col0 and R1.A = sq1.B Restriction : Alias need to be used as we convert it into join queries. Complete specification is available in https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-4226 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3249.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3249 commit b670862276d828d94ca5da94a22944e98f8aaa55 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-11-13T18:02:12Z Supporting subqueries inside where 'in' clause commit ccaddcb8c3229ba039c74acb8172f97ca99ffba1 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-11-13T18:03:29Z Added new expression class --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4207][SQL] Query which has syntax like ...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/3075 [SPARK-4207][SQL] Query which has syntax like 'not like' is not working in Spark SQL Queries which has 'not like' is not working spark sql. sql(SELECT * FROM records where value not like 'val%') same query works in Spark HiveQL You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-4207 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3075.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3075 commit 35c11e759aca32fb1162f3232dd27e7d2351b4db Author: ravipesala ravindra.pes...@huawei.com Date: 2014-11-03T16:41:37Z Supported 'not like' syntax in sql --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4154][SQL] Query does not work if it ha...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/3017#issuecomment-61291682 Thank you for your comment.I handled it and also rebased with master.Please review it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4154][SQL] Query does not work if it ha...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/3017 [SPARK-4154][SQL] Query does not work if it has not between in Spark SQL and HQL if the query contains not between does not work like. SELECT * FROM src where key not between 10 and 20' You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-4154 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3017.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3017 commit 805eebc9edb2bcff2b99c2ec5b58b8fa4398b316 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-10-30T15:49:03Z 'not between' is not working --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4120][SQL] Join of multiple tables with...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2987 [SPARK-4120][SQL] Join of multiple tables with syntax like SELECT .. FROM T1,T2,T3.. does not work in SparkSQL Right now it works for only 2 tables like below query. sql(SELECT * FROM records1 as a,records2 as b where a.key=b.key ) But it does not work for more than 2 tables like below query sql(SELECT * FROM records1 as a,records2 as b,records3 as c where a.key=b.key and a.key=c.key). You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark multijoin Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2987.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2987 commit 429b00515b6265ec44589b94fba55d9d6727d8d8 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-10-29T02:49:44Z Support multiple joins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Support for Bitwise AND(), ...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2961 [SPARK-3814][SQL] Support for Bitwise AND(), OR(|) ,XOR(^), NOT(~) in Spark HQL and SQL Currently there is no support of Bitwise , | in Spark HiveQl and Spark SQL as well. So this PR support the same. I am closing https://github.com/apache/spark/pull/2926 as it has conflicts to merge. And also added support for Bitwise AND(), OR(|) ,XOR(^), NOT(~) And I handled all review comments in that PR You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-3814-NEW4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2961.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2961 commit a391c7ac4a7faaf7a6f5ac6c31114cf983edfb43 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-10-27T18:20:21Z Rebase with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Support for Bitwise AND(), ...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2961#issuecomment-60644658 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Support for Bitwise AND(), ...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2926#issuecomment-60644904 Closed this PR and created new PR https://github.com/apache/spark/pull/2961 after rebasing with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Support for Bitwise AND(), ...
Github user ravipesala closed the pull request at: https://github.com/apache/spark/pull/2926 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Support for Bitwise AND(), ...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2926 [SPARK-3814][SQL] Support for Bitwise AND(), OR(|) ,XOR(^), NOT(~) in Spark HQL and SQL Currently there is no support of Bitwise , | in Spark HiveQl and Spark SQL as well. So this PR support the same. I am closing https://github.com/apache/spark/pull/2789 as it has conflicts to merge. And also added support for Bitwise AND(), OR(|) ,XOR(^), NOT(~) And I handled all review comments in that PR. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-3814-NEW3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2926.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2926 commit 90ebbe095476246ffbb159ad5461cd5990348589 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-10-24T12:17:35Z Rebased with master and handled comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3483][SQL] Special chars in column name...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2927 [SPARK-3483][SQL] Special chars in column names Supporting special chars in column names by using back ticks. Closed https://github.com/apache/spark/pull/2804 and created this PR as it has merge conflicts You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-3483-NEW Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2927.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2927 commit f6329f35c5254b0dd2275dfb4ced5954276487fb Author: ravipesala ravindra.pes...@huawei.com Date: 2014-10-24T12:55:10Z Rebased with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Bitwise does not work in H...
Github user ravipesala closed the pull request at: https://github.com/apache/spark/pull/2789 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Bitwise does not work in H...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2789#issuecomment-60383108 Closed this PR as it has merge conflicts and created new PR https://github.com/apache/spark/pull/2926 and handled comments here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3483][SQL] Special chars in column name...
Github user ravipesala closed the pull request at: https://github.com/apache/spark/pull/2804 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3483][SQL] Special chars in column name...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2804#issuecomment-60383267 Closed this PR as it has merge conflicts and created new PR https://github.com/apache/spark/pull/2927 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Bitwise does not work in H...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2789#issuecomment-59329218 Added support for Bitwise AND(), OR(|) ,XOR(^), NOT(~) in this PR only. Please review it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3483][SQL] Special chars in column name...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2804 [SPARK-3483][SQL] Special chars in column names Supporting special chars in column names by using back ticks. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-3483 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2804.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2804 commit 477e883b77934d5a34172ff894ba8fb4551035ea Author: ravipesala ravindra.pes...@huawei.com Date: 2014-10-14T17:05:33Z Special chars in column names --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Bitwise does not work in H...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2789#issuecomment-59087699 @marmbrus Please review this PR, I handled review comments of PR https://github.com/apache/spark/pull/2736.Due to merge conflicts I have created new PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Bitwise does not work in H...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2736#issuecomment-58872032 Thank you @scwf , I have created new PR since it has merge conflicts. It will not be neat If I rebase and push to old PR because it will show all changed files which are merged while rebasing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Bitwise does not work in H...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2772#issuecomment-58980787 Again merge conflicts :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Bitwise does not work in H...
Github user ravipesala closed the pull request at: https://github.com/apache/spark/pull/2772 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Bitwise does not work in H...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2789 [SPARK-3814][SQL] Bitwise does not work in Hive Currently there is no support of Bitwise , | in Spark HiveQl and Spark SQL as well. So this PR support the same. I am closing https://github.com/apache/spark/pull/2736 as it has conflicts to merge. And I handled all review comments in that PR. Author : ravipesala ravindra.pes...@huawei.com You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-3814-NEW2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2789.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2789 commit 3233c1a904d800598010f0d9a0fbb1588c94feac Author: ravipesala ravindra.pes...@huawei.com Date: 2014-10-14T02:28:07Z Suporting Bitwise ,| in Spark HiveQL and SQL --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Bitwise does not work in H...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2789#issuecomment-58984605 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Bitwise does not work in H...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2772 [SPARK-3814][SQL] Bitwise does not work in Hive Currently there is no support of Bitwise , | in Spark HiveQl and Spark SQL as well. So this PR support the same. I am closing https://github.com/apache/spark/pull/2736 as it has conflicts to merge. And I handled all review comments in that PR. Author : ravipesala ravindra.pes...@huawei.com You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-3814-NEW1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2772.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2772 commit a73367c11dfaffef4a7f95460569d6707c95f731 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-10-11T21:34:15Z Supporting Bitwise , | in Spark SQL and HiveQl --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Bitwise does not work in H...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2736#issuecomment-58765712 Since this PR has conflicts , I created new PR https://github.com/apache/spark/pull/2772 and handled review comments in it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Bitwise does not work in H...
Github user ravipesala closed the pull request at: https://github.com/apache/spark/pull/2736 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3834][SQL] Backticks not correctly hand...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2737 [SPARK-3834][SQL] Backticks not correctly handled in subquery aliases The queries like '''SELECT a.key FROM (SELECT key FROM src) `a`''' does not work as backticks in subquery aliases are not handled properly. This PR fixes that. Author : ravipesala ravindra.pes...@huawei.com You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-3834 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2737.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2737 commit 0e0ab984cf58374914a4766a69e545c84022e2ed Author: ravipesala ravindra.pes...@huawei.com Date: 2014-10-09T21:23:40Z Fixing issue in backtick handling for subquery aliases --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Bitwise does not work in H...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2710 [SPARK-3814][SQL] Bitwise does not work in Hive Currently there is no support of Bitwise in Spark HiveQl and Spark SQL as well. So this PR support the same. Author : ravipesala ravindra.pes...@huawei.com You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-3814 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2710.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2710 commit 41e840b0ce6fe8321b131a80c8444ad70ff24c0b Author: ravipesala ravindra.pes...@huawei.com Date: 2014-10-08T11:58:46Z Supporting Bitwise in Spark HiveQl and SQL --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Bitwise does not work in H...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2710#issuecomment-58458547 @marmbrus it seems git cannot fetch the code that's why it is failed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3813][SQL] Support case when conditio...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2678#issuecomment-58464496 @marmbrus Can you also verify this PR. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3813][SQL] Support case when conditio...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2678 [SPARK-3813][SQL] Support case when conditional functions in Spark SQL. case when conditional function is already supported in Spark SQL but there is no support in SqlParser. So added parser support to it. Author : ravipesala ravindra.pes...@huawei.com You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-3813 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2678.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2678 commit 709684f1036e1ab8595f94c2d3c5314c29a20063 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-10-06T15:42:02Z Changed parser to support case when function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3813][SQL] Support case when conditio...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2678#discussion_r18471169 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -333,6 +338,24 @@ class SqlParser extends StandardTokenParsers with PackratParsers { IF ~ ( ~ expression ~ , ~ expression ~ , ~ expression ~ ) ^^ { case c ~ , ~ t ~ , ~ f = If(c,t,f) } | +CASE ~ opt(expression) ~ (WHEN ~ expression ~ THEN ~ expression).* ~ + opt(ELSE ~ expression) ~ END ^^ { + case c ~ l ~ el = + var caseWhenExpr = l.map{x = + x match { + case w ~ we ~ t ~ te = + c match { + case Some(e) = Seq(EqualTo(e, we), te) + case None = Seq(we, te) + } + } +}.toSeq.reduce(_ ++ _) +caseWhenExpr = el match { + case Some(e) = caseWhenExpr ++ Seq(e) + case None = caseWhenExpr +} +CaseWhen(caseWhenExpr) +} | --- End diff -- Awesome! code is greatly simplified. Thank you for your comments. Updated code as per your suggestions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3813][SQL] Support case when conditio...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2678#discussion_r18496795 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -128,6 +128,11 @@ class SqlParser extends StandardTokenParsers with PackratParsers { protected val UNION = Keyword(UNION) protected val UPPER = Keyword(UPPER) protected val WHERE = Keyword(WHERE) + protected val CASE = Keyword(CASE) + protected val WHEN = Keyword(WHEN) + protected val THEN = Keyword(THEN) + protected val ELSE = Keyword(ELSE) + protected val END = Keyword(END) --- End diff -- OK. Sorted --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3813][SQL] Support case when conditio...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2678#discussion_r18496791 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -333,6 +338,15 @@ class SqlParser extends StandardTokenParsers with PackratParsers { IF ~ ( ~ expression ~ , ~ expression ~ , ~ expression ~ ) ^^ { case c ~ , ~ t ~ , ~ f = If(c,t,f) } | +CASE ~ expression.? ~ (WHEN ~ expression ~ (THEN ~ expression)).* ~ + (ELSE ~ expression).? ~ END ^^ { + case casePart ~ altPart ~ elsePart = + val altExprs = altPart.flatMap{ --- End diff -- OK. Updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3813][SQL] Support case when conditio...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2678#discussion_r18496800 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -333,6 +338,15 @@ class SqlParser extends StandardTokenParsers with PackratParsers { IF ~ ( ~ expression ~ , ~ expression ~ , ~ expression ~ ) ^^ { case c ~ , ~ t ~ , ~ f = If(c,t,f) } | +CASE ~ expression.? ~ (WHEN ~ expression ~ (THEN ~ expression)).* ~ + (ELSE ~ expression).? ~ END ^^ { + case casePart ~ altPart ~ elsePart = + val altExprs = altPart.flatMap{ + case we ~ te = + Seq(casePart.fold(we)(EqualTo(_, we)),te) --- End diff -- OK. Updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3813][SQL] Support case when conditio...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2678#issuecomment-58126077 Thank you for reviewing it. I have updated the code as per your comments. Please review it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3813][SQL] Support case when conditio...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2678#issuecomment-58126112 Thank you for reviewing it. I have updated the code as per your comments. Please review it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2693][SQL] Supported for UDAF Hive Aggr...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2620#issuecomment-57819925 You are right, it is not so good to pass resolver in constructor. Instead I just passed boolean flag. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3654][SQL] Implement all extended HiveQ...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2590#issuecomment-57595734 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2693][SQL] Supported for UDAF Hive Aggr...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2620 [SPARK-2693][SQL] Supported for UDAF Hive Aggregates like PERCENTILE Implemented UDAF Hive aggregates by adding wrapper to Spark Hive. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-2693 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2620.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2620 commit 57862006cbba635284efe29ce35fefe1d2f25b04 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-10-01T19:44:37Z Supported for UDAF Hive Aggregates like PERCENTILE --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3708][SQL] Backticks aren't handled cor...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2594#issuecomment-57528756 @marmbrus I am not sure why it is failed. The error shows git could not fetch the code and timed out. Do I have to do something here.? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3371][SQL] Renaming a function expressi...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2511#discussion_r18321536 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -166,7 +186,7 @@ class SqlParser extends StandardTokenParsers with PackratParsers { val withFilter = f.map(f = Filter(f, base)).getOrElse(base) val withProjection = g.map {g = -Aggregate(assignAliases(g), assignAliases(p), withFilter) +Aggregate(assignAliasesForGroups(g,p), assignAliases(p), withFilter) --- End diff -- Yes @marmbrus , better we remove assignAliases to grouping expressions.Updated code as per that. Please review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3654][SQL] Implement all extended HiveQ...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2590#issuecomment-57581369 Fixed code as per comments, please review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3654][SQL] Implement all extended HiveQ...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2590#discussion_r18211038 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSqlParser.scala --- @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import scala.language.implicitConversions +import scala.util.parsing.combinator.syntactical.StandardTokenParsers +import scala.util.parsing.combinator.PackratParsers +import scala.util.parsing.input.CharArrayReader.EofCh +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.SqlLexical +import scala.util.parsing.combinator.lexical.StdLexical + +/** + * A simple Hive SQL pre parser. It parses the commands like cache,uncache etc and + * remaining actual query will be parsed by HiveQl.parseSql + */ --- End diff -- Looks good. I updated as per your comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3654][SQL] Implement all extended HiveQ...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2590#discussion_r18211061 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSqlParser.scala --- @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import scala.language.implicitConversions +import scala.util.parsing.combinator.syntactical.StandardTokenParsers +import scala.util.parsing.combinator.PackratParsers +import scala.util.parsing.input.CharArrayReader.EofCh +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.SqlLexical +import scala.util.parsing.combinator.lexical.StdLexical + +/** + * A simple Hive SQL pre parser. It parses the commands like cache,uncache etc and + * remaining actual query will be parsed by HiveQl.parseSql + */ +class HiveSqlParser extends StandardTokenParsers with PackratParsers { --- End diff -- OK. Changed the file name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3654][SQL] Implement all extended HiveQ...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2590#discussion_r18211084 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSqlParser.scala --- @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import scala.language.implicitConversions +import scala.util.parsing.combinator.syntactical.StandardTokenParsers +import scala.util.parsing.combinator.PackratParsers +import scala.util.parsing.input.CharArrayReader.EofCh +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.SqlLexical +import scala.util.parsing.combinator.lexical.StdLexical + +/** + * A simple Hive SQL pre parser. It parses the commands like cache,uncache etc and + * remaining actual query will be parsed by HiveQl.parseSql + */ +class HiveSqlParser extends StandardTokenParsers with PackratParsers { + + def apply(input: String): LogicalPlan = { +// Special-case out set commands since the value fields can be +// complex to handle without RegexParsers. Also this approach +// is clearer for the several possible cases of set commands. +if (input.trim.toLowerCase.startsWith(set)) { --- End diff -- OK. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3654][SQL] Implement all extended HiveQ...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2590#discussion_r1822 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSqlParser.scala --- @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import scala.language.implicitConversions +import scala.util.parsing.combinator.syntactical.StandardTokenParsers +import scala.util.parsing.combinator.PackratParsers +import scala.util.parsing.input.CharArrayReader.EofCh +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.SqlLexical +import scala.util.parsing.combinator.lexical.StdLexical + +/** + * A simple Hive SQL pre parser. It parses the commands like cache,uncache etc and + * remaining actual query will be parsed by HiveQl.parseSql + */ +class HiveSqlParser extends StandardTokenParsers with PackratParsers { + + def apply(input: String): LogicalPlan = { +// Special-case out set commands since the value fields can be +// complex to handle without RegexParsers. Also this approach +// is clearer for the several possible cases of set commands. +if (input.trim.toLowerCase.startsWith(set)) { + input.trim.drop(3).split(=, 2).map(_.trim) match { +case Array() = // set + SetCommand(None, None) +case Array(key) = // set key + SetCommand(Some(key), None) +case Array(key, value) = // set key=value + SetCommand(Some(key), Some(value)) + } +} else if (input.trim.startsWith(!)) { + ShellCommand(input.drop(1)) +} else { + phrase(query)(new lexical.Scanner(input)) match { +case Success(r, x) = r +case x = sys.error(x.toString) + } +} + } + + protected case class Keyword(str: String) + + protected val CACHE = Keyword(CACHE) + protected val SET = Keyword(SET) + protected val ADD = Keyword(ADD) + protected val JAR = Keyword(JAR) + protected val TABLE = Keyword(TABLE) + protected val AS = Keyword(AS) + protected val UNCACHE = Keyword(UNCACHE) + protected val FILE = Keyword(FILE) + protected val DFS = Keyword(DFS) + protected val SOURCE = Keyword(SOURCE) + + protected implicit def asParser(k: Keyword): Parser[String] = +lexical.allCaseVersions(k.str).map(x = x : Parser[String]).reduce(_ | _) + + protected def allCaseConverse(k: String): Parser[String] = +lexical.allCaseVersions(k).map(x = x : Parser[String]).reduce(_ | _) + + protected val reservedWords = +this.getClass + .getMethods + .filter(_.getReturnType == classOf[Keyword]) + .map(_.invoke(this).asInstanceOf[Keyword].str) + + override val lexical = new SqlLexical(reservedWords) + + protected lazy val query: Parser[LogicalPlan] = ( +cache | unCache | addJar | addFile | dfs | source | hiveQl --- End diff -- updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3654][SQL] Implement all extended HiveQ...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2590#discussion_r18211107 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala --- @@ -75,6 +75,9 @@ class LocalHiveContext(sc: SparkContext) extends HiveContext(sc) { */ class HiveContext(sc: SparkContext) extends SQLContext(sc) { self = + + @transient + protected[sql] val hiveParser = new HiveSqlParser --- End diff -- OK. I have moved it to HiveQl now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3654][SQL] Implement all extended HiveQ...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2590#discussion_r18211122 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSqlParser.scala --- @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import scala.language.implicitConversions +import scala.util.parsing.combinator.syntactical.StandardTokenParsers +import scala.util.parsing.combinator.PackratParsers +import scala.util.parsing.input.CharArrayReader.EofCh +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.SqlLexical +import scala.util.parsing.combinator.lexical.StdLexical + +/** + * A simple Hive SQL pre parser. It parses the commands like cache,uncache etc and + * remaining actual query will be parsed by HiveQl.parseSql + */ +class HiveSqlParser extends StandardTokenParsers with PackratParsers { + + def apply(input: String): LogicalPlan = { +// Special-case out set commands since the value fields can be +// complex to handle without RegexParsers. Also this approach +// is clearer for the several possible cases of set commands. +if (input.trim.toLowerCase.startsWith(set)) { + input.trim.drop(3).split(=, 2).map(_.trim) match { +case Array() = // set + SetCommand(None, None) +case Array(key) = // set key + SetCommand(Some(key), None) +case Array(key, value) = // set key=value + SetCommand(Some(key), Some(value)) + } +} else if (input.trim.startsWith(!)) { + ShellCommand(input.drop(1)) +} else { + phrase(query)(new lexical.Scanner(input)) match { +case Success(r, x) = r +case x = sys.error(x.toString) + } +} + } + + protected case class Keyword(str: String) + + protected val CACHE = Keyword(CACHE) + protected val SET = Keyword(SET) + protected val ADD = Keyword(ADD) + protected val JAR = Keyword(JAR) + protected val TABLE = Keyword(TABLE) + protected val AS = Keyword(AS) + protected val UNCACHE = Keyword(UNCACHE) + protected val FILE = Keyword(FILE) + protected val DFS = Keyword(DFS) + protected val SOURCE = Keyword(SOURCE) + + protected implicit def asParser(k: Keyword): Parser[String] = +lexical.allCaseVersions(k.str).map(x = x : Parser[String]).reduce(_ | _) + + protected def allCaseConverse(k: String): Parser[String] = +lexical.allCaseVersions(k).map(x = x : Parser[String]).reduce(_ | _) + + protected val reservedWords = +this.getClass + .getMethods + .filter(_.getReturnType == classOf[Keyword]) + .map(_.invoke(this).asInstanceOf[Keyword].str) + + override val lexical = new SqlLexical(reservedWords) + + protected lazy val query: Parser[LogicalPlan] = ( +cache | unCache | addJar | addFile | dfs | source | hiveQl + ) + + protected lazy val hiveQl: Parser[LogicalPlan] = +remainingQuery ^^ { + case r = HiveQl.parseSql(r.trim()) +} + + /** It returns all remaining query */ + protected lazy val remainingQuery: Parser[String] = new Parser[String] { +def apply(in:Input) = Success(in.source.subSequence(in.offset, in.source.length).toString, --- End diff -- Updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3654][SQL] Implement all extended HiveQ...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2590#issuecomment-57299430 Thank you. I updated the code as per your comments, Please review it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3654][SQL] Implement all extended HiveQ...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2590#issuecomment-57335446 Thank you for your comments. I updated code as per your comments.Please review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3708][SQL] Backticks aren't handled cor...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2594 [SPARK-3708][SQL] Backticks aren't handled correctly is aliases The below query gives error sql(SELECT k FROM (SELECT \`key\` AS \`k\` FROM src) a) It gives error because the aliases are not cleaned so it could not be resolved in further processing. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-3708 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2594.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2594 commit d55db54a65c0cc8f743b3c7b775abebd13c8e0fa Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-30T16:29:42Z Fixed SPARK-3708 (Backticks aren't handled correctly is aliases) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3654][SQL] Implement all extended HiveQ...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2590 [SPARK-3654][SQL] Implement all extended HiveQL statements/commands with a separate parser combinator Created separate parser for hql. It preparses the commands like cache,uncache,add jar etc.. and then parses with HiveQl You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-3654 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2590.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2590 commit ba26cd1a5895fa71a028992b5c946bd8333dfdc3 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-30T04:21:42Z Created seperate parser for hql.It pre parses the commands like cache,uncache,add jar etc.. and then parses with HiveQl --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3371][SQL] Renaming a function expressi...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2511 [SPARK-3371][SQL] Renaming a function expression with group by gives error The following code gives error. ``` sqlContext.registerFunction(len, (s: String) = s.length) sqlContext.sql(select len(foo) as a, count(1) from t1 group by len(foo)).collect() ``` Because SQl parser creates the aliases to the functions in grouping expressions with generated alias names. So if user gives the alias names to the functions inside projection then it does not match the generated alias name of grouping expression. So the fix I have given that if user provides alias to the function in projection then don't generate alias in grouping expression,use the same alias. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-3371 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2511.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2511 commit bad2fd00be2f5b79c08dace5cc107408bd5ca019 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-23T09:31:14Z SPARK-3371 : Fixed Renaming a function expression with group by gives error Signed-off-by: ravipesala ravindra.pes...@huawei.com commit f8ace79e4015da06ea2821dfc8b9dfbc06fadd1c Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-23T18:43:27Z Fixed the testcase issue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3536][SQL] SELECT on empty parquet tabl...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2456 [SPARK-3536][SQL] SELECT on empty parquet table throws exception It return null metadata from parquet if querying on empty parquet file while calculating splits.So added null check and returns the empty splits. Author : ravipesala ravindra.pes...@huawei.com You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-3536 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2456.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2456 commit 1e81a50631b1f44ad7de65b83408a40218234745 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-18T18:02:46Z Fixed the issue when querying on empty parquet file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3536][SQL] SELECT on empty parquet tabl...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2456#issuecomment-56157072 Please review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2397#discussion_r17711807 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala --- @@ -166,3 +166,20 @@ case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])( child.output.map(field = Row(field.name, field.dataType.toString, null)) } } + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan) + extends LeafNode with Command { + + override protected[sql] lazy val sideEffectResult = { +sqlContext.catalog.registerTable(None, tableName, sqlContext.executePlan(plan).analyzed) --- End diff -- Updated the code. Please review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2397#issuecomment-56001847 Updated as per comments. Please review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2381#issuecomment-56020207 OK. Closing this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala closed the pull request at: https://github.com/apache/spark/pull/2381 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala closed the pull request at: https://github.com/apache/spark/pull/2390 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2390#issuecomment-56020320 OK. Closing this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2397#discussion_r17659871 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala --- @@ -166,3 +166,20 @@ case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])( child.output.map(field = Row(field.name, field.dataType.toString, null)) } } + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan) + extends LeafNode with Command { + + override protected[sql] lazy val sideEffectResult = { +sqlContext.catalog.registerTable(None, tableName, sqlContext.executePlan(plan).analyzed) --- End diff -- Thank you for your comment. It is a good idea to import ```sqlContext._```. But we can simplify as below code if we import it. Please comment on it. ``` import sqlContext._ plan.registerTempTable(tableName) cacheTable(tableName) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2397#discussion_r17707447 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala --- @@ -166,3 +166,20 @@ case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])( child.output.map(field = Row(field.name, field.dataType.toString, null)) } } + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan) + extends LeafNode with Command { + + override protected[sql] lazy val sideEffectResult = { +sqlContext.catalog.registerTable(None, tableName, sqlContext.executePlan(plan).analyzed) --- End diff -- It seems we cannot use the ```import org.apache.spark.sql.SQLContext._``` at the beginning of file to use implicit. Because there is no ```object``` defined for ```SQLContext``` and implicits are only part of ```class SQLContext```. We can only use the import on instance like ```import sqlContext._``` Please correct me if I am wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org