[jira] [Commented] (SPARK-5680) Sum function on all null values, should return zero
[ https://issues.apache.org/jira/browse/SPARK-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587473#comment-14587473 ] Venkata Ramana G commented on SPARK-5680: - Holman, You are right that column with all NULL values should return NULL. As my motivation was to fix udaf_number_format.q, "select sum('a') from src" returns 0 in hive, mysql. and "select cast('a' as double) from src" returned NULL in hive. I assumed or rather wrongly analysed it as "Sum of ALL NULLs return 0" and this has introduced the problem. I apologize for this and will submit the patch to revert that fix. "select sum('a') from src" returning 0 in hive and mysql created this confusion, is still not clear. > Sum function on all null values, should return zero > --- > > Key: SPARK-5680 > URL: https://issues.apache.org/jira/browse/SPARK-5680 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Venkata Ramana G >Assignee: Venkata Ramana G >Priority: Minor > Fix For: 1.3.1, 1.4.0 > > > SELECT sum('a'), avg('a'), variance('a'), std('a') FROM src; > Current output: > NULL NULLNULLNULL > Expected output: > 0.0 NULLNULLNULL > This fixes hive udaf_number_format.q -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7646) Create table support to JDBC Datasource
Venkata Ramana G created SPARK-7646: --- Summary: Create table support to JDBC Datasource Key: SPARK-7646 URL: https://issues.apache.org/jira/browse/SPARK-7646 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Venkata Ramana G Support Create table into JDBCDataSource. Following are usage examples {code} df.saveAsTable( "testcreate2", "org.apache.spark.sql.jdbc", org.apache.spark.sql.SaveMode.Overwrite, Map("url"->s"$url", "dbtable"->"testcreate2", "user"->"xx", "password"->"xx", "driver"->"com.h2.Driver") ) {code} if table doesnot exists, this should create a table and write dataframe content to table. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7601) Support Insert into JDBC Datasource
[ https://issues.apache.org/jira/browse/SPARK-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Ramana G updated SPARK-7601: Description: Support Insert into JDBCDataSource. Following are usage examples {code} sqlContext.sql( s""" |CREATE TEMPORARY TABLE testram1 |USING org.apache.spark.sql.jdbc |OPTIONS (url '$url', dbtable 'testram1', user 'xx', password 'xx', driver 'com.h2.Driver') """.stripMargin.replaceAll("\n", " ")) sqlContext.sql("insert into table testram1 select * from testsrc").show {code} was: Support Insert into JDBCDataSource. Following are usage examples {code} df.saveAsTable( "testcreate2", "org.apache.spark.sql.jdbc", org.apache.spark.sql.SaveMode.Overwrite, Map("url"->s"$url", "dbtable"->"testcreate2", "user"->"xx", "password"->"xx", "driver"->"com.h2.Driver") ) or sqlContext.sql( s""" |CREATE TEMPORARY TABLE testram1 |USING org.apache.spark.sql.jdbc |OPTIONS (url '$url', dbtable 'testram1', user 'xx', password 'xx', driver 'com.h2.Driver') """.stripMargin.replaceAll("\n", " ")) sqlContext.sql("insert into table testram1 select * from testsrc").show {code} > Support Insert into JDBC Datasource > --- > > Key: SPARK-7601 > URL: https://issues.apache.org/jira/browse/SPARK-7601 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.1 >Reporter: Venkata Ramana G > > Support Insert into JDBCDataSource. Following are usage examples > {code} > sqlContext.sql( > s""" > |CREATE TEMPORARY TABLE testram1 > |USING org.apache.spark.sql.jdbc > |OPTIONS (url '$url', dbtable 'testram1', user 'xx', password 'xx', > driver 'com.h2.Driver') > """.stripMargin.replaceAll("\n", " ")) > sqlContext.sql("insert into table testram1 select * from testsrc").show > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7601) Support Insert into JDBC Datasource
[ https://issues.apache.org/jira/browse/SPARK-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Ramana G updated SPARK-7601: Description: Support Insert into JDBCDataSource. Following are usage examples {code} df.saveAsTable( "testcreate2", "org.apache.spark.sql.jdbc", org.apache.spark.sql.SaveMode.Overwrite, Map("url"->s"$url", "dbtable"->"testcreate2", "user"->"xx", "password"->"xx", "driver"->"com.h2.Driver") ) or sqlContext.sql( s""" |CREATE TEMPORARY TABLE testram1 |USING org.apache.spark.sql.jdbc |OPTIONS (url '$url', dbtable 'testram1', user 'xx', password 'xx', driver 'com.h2.Driver') """.stripMargin.replaceAll("\n", " ")) sqlContext.sql("insert into table testram1 select * from testsrc").show {code} was: Support Insert into JDBCDataSource. Following are usage examples {code} df.saveAsTable("testcreate2","org.apache.spark.sql.jdbc", org.apache.spark.sql.SaveMode.Overwrite, Map("url"->s"$url", "dbtable"->"testcreate2", "user"->"xx", "password"->"xx", "driver"->"com.h2.Driver")) or sqlContext.sql( s""" |CREATE TEMPORARY TABLE testram1 |USING org.apache.spark.sql.jdbc |OPTIONS (url '$url', dbtable 'testram1', user 'xx', password 'xx', driver 'com.h2.Driver') """.stripMargin.replaceAll("\n", " ")) sqlContext.sql("insert into table testram1 select * from testsrc").show {code} > Support Insert into JDBC Datasource > --- > > Key: SPARK-7601 > URL: https://issues.apache.org/jira/browse/SPARK-7601 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.1 >Reporter: Venkata Ramana G > > Support Insert into JDBCDataSource. Following are usage examples > {code} > df.saveAsTable( > "testcreate2", > "org.apache.spark.sql.jdbc", > org.apache.spark.sql.SaveMode.Overwrite, > Map("url"->s"$url", "dbtable"->"testcreate2", "user"->"xx", > "password"->"xx", "driver"->"com.h2.Driver") > ) > or > sqlContext.sql( > s""" > |CREATE TEMPORARY TABLE testram1 > |USING org.apache.spark.sql.jdbc > |OPTIONS (url '$url', dbtable 'testram1', user 'xx', password 'xx', > driver 'com.h2.Driver') > """.stripMargin.replaceAll("\n", " ")) > sqlContext.sql("insert into table testram1 select * from testsrc").show > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7601) Support Insert into JDBC Datasource
Venkata Ramana G created SPARK-7601: --- Summary: Support Insert into JDBC Datasource Key: SPARK-7601 URL: https://issues.apache.org/jira/browse/SPARK-7601 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.1 Reporter: Venkata Ramana G Support Insert into JDBCDataSource. Following are usage examples {code} df.saveAsTable("testcreate2","org.apache.spark.sql.jdbc", org.apache.spark.sql.SaveMode.Overwrite, Map("url"->s"$url", "dbtable"->"testcreate2", "user"->"xx", "password"->"xx", "driver"->"com.h2.Driver")) or sqlContext.sql( s""" |CREATE TEMPORARY TABLE testram1 |USING org.apache.spark.sql.jdbc |OPTIONS (url '$url', dbtable 'testram1', user 'xx', password 'xx', driver 'com.h2.Driver') """.stripMargin.replaceAll("\n", " ")) sqlContext.sql("insert into table testram1 select * from testsrc").show {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7484) Support passing jdbc connection properties for dataframe.createJDBCTable and insertIntoJDBC
Venkata Ramana G created SPARK-7484: --- Summary: Support passing jdbc connection properties for dataframe.createJDBCTable and insertIntoJDBC Key: SPARK-7484 URL: https://issues.apache.org/jira/browse/SPARK-7484 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.1 Reporter: Venkata Ramana G Priority: Minor Few jdbc drivers like SybaseIQ support passing username and password only through connection properties. So the same needs to be supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6451) Support CombineSum in Code Gen
[ https://issues.apache.org/jira/browse/SPARK-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375456#comment-14375456 ] Venkata Ramana G commented on SPARK-6451: - Working on the same. > Support CombineSum in Code Gen > -- > > Key: SPARK-6451 > URL: https://issues.apache.org/jira/browse/SPARK-6451 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yin Huai >Priority: Blocker > > Since we are using CombineSum at the reducer side for the SUM function, we > need to make it work in code gen. Otherwise, code gen will not convert > Aggregates with a SUM function to GeneratedAggregates (the code gen version). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5818) unable to use "add jar" in hql
[ https://issues.apache.org/jira/browse/SPARK-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366811#comment-14366811 ] Venkata Ramana G commented on SPARK-5818: - TranslatingClassLoader is used for Spark-shell, while current hive's add jar can work only with URLClassLoader. So jar has to be directly added to spark driver's class loader or its parent loader, in case of spark-shell I am working the same. > unable to use "add jar" in hql > -- > > Key: SPARK-5818 > URL: https://issues.apache.org/jira/browse/SPARK-5818 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0, 1.2.1 >Reporter: pengxu > > In the spark 1.2.1 and 1.2.0, it's unable the use the hive command "add jar" > in hql. > It seems that the problem in spark-2219 is still existed. > the problem can be reproduced as described in the below. Suppose the jar file > is named brickhouse-0.6.0.jar and is placed in the /tmp directory > {code} > spark-shell>import org.apache.spark.sql.hive._ > spark-shell>val sqlContext = new HiveContext(sc) > spark-shell>import sqlContext._ > spark-shell>hql("add jar /tmp/brickhouse-0.6.0.jar") > {code} > the error message is showed as blow > {code:title=Error Log} > 15/02/15 01:36:31 ERROR SessionState: Unable to register > /tmp/brickhouse-0.6.0.jar > Exception: org.apache.spark.repl.SparkIMain$TranslatingClassLoader cannot be > cast to java.net.URLClassLoader > java.lang.ClassCastException: > org.apache.spark.repl.SparkIMain$TranslatingClassLoader cannot be cast to > java.net.URLClassLoader > at > org.apache.hadoop.hive.ql.exec.Utilities.addToClassPath(Utilities.java:1921) > at > org.apache.hadoop.hive.ql.session.SessionState.registerJar(SessionState.java:599) > at > org.apache.hadoop.hive.ql.session.SessionState$ResourceType$2.preHook(SessionState.java:658) > at > org.apache.hadoop.hive.ql.session.SessionState.add_resource(SessionState.java:732) > at > org.apache.hadoop.hive.ql.session.SessionState.add_resource(SessionState.java:717) > at > org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:54) > at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:319) > at > org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276) > at > org.apache.spark.sql.hive.execution.AddJar.sideEffectResult$lzycompute(commands.scala:74) > at > org.apache.spark.sql.hive.execution.AddJar.sideEffectResult(commands.scala:73) > at > org.apache.spark.sql.execution.Command$class.execute(commands.scala:46) > at org.apache.spark.sql.hive.execution.AddJar.execute(commands.scala:68) > at > org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) > at > org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) > at > org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) > at org.apache.spark.sql.SchemaRDD.(SchemaRDD.scala:108) > at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:102) > at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:106) > at > $line30.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:24) > at > $line30.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:29) > at > $line30.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31) > at $line30.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) > at $line30.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:35) > at $line30.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:37) > at $line30.$read$$iwC$$iwC$$iwC$$iwC.(:39) > at $line30.$read$$iwC$$iwC$$iwC.(:41) > at $line30.$read$$iwC$$iwC.(:43) > at $line30.$read$$iwC.(:45) > at $line30.$read.(:47) > at $line30.$read$.(:51) > at $line30.$read$.() > at $line30.$eval$.(:7) > at $line30.$eval$.() > at $line30.$eval.$print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828) > at > org.apache.spark.repl.S
[jira] [Created] (SPARK-5765) word split problem in run-example and compute-classpath
Venkata Ramana G created SPARK-5765: --- Summary: word split problem in run-example and compute-classpath Key: SPARK-5765 URL: https://issues.apache.org/jira/browse/SPARK-5765 Project: Spark Issue Type: Bug Components: Examples Affects Versions: 1.2.1, 1.3.0, 1.1.2 Reporter: Venkata Ramana G Work split problem in spark directory path in scripts run-example and compute-classpath.sh This was introduced in defect fix SPARK-4504 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5680) Sum function on all null values, should return zero
Venkata Ramana G created SPARK-5680: --- Summary: Sum function on all null values, should return zero Key: SPARK-5680 URL: https://issues.apache.org/jira/browse/SPARK-5680 Project: Spark Issue Type: Bug Components: SQL Reporter: Venkata Ramana G Priority: Minor SELECT sum('a'), avg('a'), variance('a'), std('a') FROM src; Current output: NULLNULLNULLNULL Expected output: 0.0 NULLNULLNULL This fixes hive udaf_number_format.q -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4504) run-example fails if multiple example jars present in target folder
Venkata Ramana G created SPARK-4504: --- Summary: run-example fails if multiple example jars present in target folder Key: SPARK-4504 URL: https://issues.apache.org/jira/browse/SPARK-4504 Project: Spark Issue Type: Bug Components: Examples Affects Versions: 1.1.0, 1.2.0 Reporter: Venkata Ramana G Priority: Minor Giving following error: bin/run-example: line 39: [: /mnt/d/spark/spark/examples/target/scala-2.10/spark-examples-1.1.0-SNAPSHOT-hadoop1.0.4.jar: binary operator expected Failed to find Spark examples assembly in /mnt/d/spark/spark/lib or /mnt/d/spark/spark/examples/target You need to build Spark before running this program -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4296) Throw "Expression not in GROUP BY" when using same expression in group by clause and select clause
[ https://issues.apache.org/jira/browse/SPARK-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204263#comment-14204263 ] Venkata Ramana G commented on SPARK-4296: - Alias are being added implicitly to structure fields in group by for aggregate expressions, so aggregate expression comparison to group by expression is failing. So Upper(birthday#11.date AS date#17) is compared against Upper(birthday#11.date) > Throw "Expression not in GROUP BY" when using same expression in group by > clause and select clause > --- > > Key: SPARK-4296 > URL: https://issues.apache.org/jira/browse/SPARK-4296 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Shixiong Zhu > > When the input data has a complex structure, using same expression in group > by clause and select clause will throw "Expression not in GROUP BY". > {code:java} > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.createSchemaRDD > case class Birthday(date: String) > case class Person(name: String, birthday: Birthday) > val people = sc.parallelize(List(Person("John", Birthday("1990-01-22")), > Person("Jim", Birthday("1980-02-28" > people.registerTempTable("people") > val year = sqlContext.sql("select count(*), upper(birthday.date) from people > group by upper(birthday.date)") > year.collect > {code} > Here is the plan of year: > {code:java} > SchemaRDD[3] at RDD at SchemaRDD.scala:105 > == Query Plan == > == Physical Plan == > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Expression > not in GROUP BY: Upper(birthday#1.date AS date#9) AS c1#3, tree: > Aggregate [Upper(birthday#1.date)], [COUNT(1) AS c0#2L,Upper(birthday#1.date > AS date#9) AS c1#3] > Subquery people > LogicalRDD [name#0,birthday#1], MapPartitionsRDD[1] at mapPartitions at > ExistingRDD.scala:36 > {code} > The bug is the equality test for `Upper(birthday#1.date)` and > `Upper(birthday#1.date AS date#9)`. > Maybe Spark SQL needs a mechanism to compare Alias expression and non-Alias > expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4263) PERCENTILE is not working
[ https://issues.apache.org/jira/browse/SPARK-4263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199741#comment-14199741 ] Venkata Ramana G commented on SPARK-4263: - Key looks like string type column according to error. Array type parameter to percentile is submitted under PR https://github.com/apache/spark/pull/2802 for [SPARK-3891] The same is added as part of its testcase. > PERCENTILE is not working > - > > Key: SPARK-4263 > URL: https://issues.apache.org/jira/browse/SPARK-4263 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Cheng Hao >Priority: Minor > > When query "select percentile(key, array(0, 0.5,1)) from src", it will throws > exception like: > {panel} > org.apache.hadoop.hive.ql.exec.NoMatchingMethodException: No matching method > for class org.apache.hadoop.hive.ql.udf.UDAFPercentile with (string, > array). Possible choices: _FUNC_(bigint, array) > _FUNC_(bigint, double) > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getMethodInternal(FunctionRegistry.java:1213) > at > org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:84) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:56) > at > org.apache.spark.sql.hive.HiveUdaf.objectInspector$lzycompute(hiveUdfs.scala:234) > at > org.apache.spark.sql.hive.HiveUdaf.objectInspector(hiveUdfs.scala:233) > at org.apache.spark.sql.hive.HiveUdaf.dataType(hiveUdfs.scala:241) > at > org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:104) > at > org.apache.spark.sql.catalyst.plans.logical.Aggregate$$anonfun$output$6.apply(basicOperators.scala:143) > at > org.apache.spark.sql.catalyst.plans.logical.Aggregate$$anonfun$output$6.apply(basicOperators.scala:143) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.plans.logical.Aggregate.output(basicOperators.scala:143) > at > org.apache.spark.sql.catalyst.plans.logical.Limit.output(basicOperators.scala:147) > at > org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$unapply$1.apply(patterns.scala:61) > at > org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$unapply$1.apply(patterns.scala:61) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.catalyst.planning.PhysicalOperation$.unapply(patterns.scala:61) > at > org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:34) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) > at > org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418) > at > org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416) > at > org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422) > at > org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422) > at > org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:425) > at > org.apache.spark.sql.hive.thriftserver.AbstractSparkSQLDriver.run(AbstractSparkSQLDriver.scala:59) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:276) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:211) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > {panel} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4234) Always do paritial aggregation
[ https://issues.apache.org/jira/browse/SPARK-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198694#comment-14198694 ] Venkata Ramana G commented on SPARK-4234: - As I understand currently HiveUDAF and HiveGenericUDAF support only AggregateExpression, but doesnot support partial Aggregation. So we should support PartialAggregate for HiveUDAF and HiveGenericUDAF , using Hive UDAF's paritial aggregation interfaces. Please correct my understanding. > Always do paritial aggregation > --- > > Key: SPARK-4234 > URL: https://issues.apache.org/jira/browse/SPARK-4234 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Cheng Hao > > Currently, UDAF developer optionally implement a partial aggregation > function, However this probably cause performance issue by allowing do that. > We actually can always force developers to provide the partial aggregation > function as Hive does, hence we will always get the `mapside` aggregation > optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4252) SparkSQL behaves differently from Hive when encountering illegal record
[ https://issues.apache.org/jira/browse/SPARK-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198570#comment-14198570 ] Venkata Ramana G commented on SPARK-4252: - Same When i executed over hive 0.12 from hive command line is giving result hive> select * from user; OK Alice 12 Bob 13 > SparkSQL behaves differently from Hive when encountering illegal record > --- > > Key: SPARK-4252 > URL: https://issues.apache.org/jira/browse/SPARK-4252 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: patrickliu > > Hive will ignore illegal record, while SparkSQL will try to convert illegal > record. > Assume I have a text file user.txt with 2 records(userName, age): > Alice,12.4 > Bob,13 > Then I create a Hive table to query the data: > CREATE TABLE user( > name string, > age int, (Pay attention! The field is int) > ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ; > LOAD DATA LOCAL INPATH 'user' INTO TABLE user; > Then I use Hive and SparkSQL to query the 'user' table: > SQL: select * from user; > Result by Hive: > Alice NULL( Hive ignore Alice's age because it is a float number ) > Bob 13 > Result by SparkSQL: > Alice 12 ( SparkSQL converts Alice's age from float to int ) > Bob 13 > So if I run, "select sum(age) from user;" > Then I will get different result. > Maybe SparkSQL should be compatible with Hive in this scenario. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3593) Support Sorting of Binary Type Data
[ https://issues.apache.org/jira/browse/SPARK-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Ramana G resolved SPARK-3593. - Resolution: Fixed > Support Sorting of Binary Type Data > --- > > Key: SPARK-3593 > URL: https://issues.apache.org/jira/browse/SPARK-3593 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.1.0 >Reporter: Paul Magid >Assignee: Venkata Ramana G > Fix For: 1.2.0 > > > If you try sorting on a binary field you currently get an exception. Please > add support for binary data type sorting. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-3593) Support Sorting of Binary Type Data
[ https://issues.apache.org/jira/browse/SPARK-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Ramana G reopened SPARK-3593: - Reopened to assign to myself > Support Sorting of Binary Type Data > --- > > Key: SPARK-3593 > URL: https://issues.apache.org/jira/browse/SPARK-3593 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.1.0 >Reporter: Paul Magid > Fix For: 1.2.0 > > > If you try sorting on a binary field you currently get an exception. Please > add support for binary data type sorting. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4217) Result of SparkSQL is incorrect after a table join and group by operation
[ https://issues.apache.org/jira/browse/SPARK-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197799#comment-14197799 ] Venkata Ramana G commented on SPARK-4217: - I executed them on Hive 0.12 (from Hive command line) and Spark SQL latest master (from spark shell using Hive Context connecting to Hive0.12) > Result of SparkSQL is incorrect after a table join and group by operation > - > > Key: SPARK-4217 > URL: https://issues.apache.org/jira/browse/SPARK-4217 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 > Environment: Hadoop 2.2.0 > Spark1.1 >Reporter: peter.zhang >Priority: Critical > Attachments: TestScript.sql, saledata.zip > > > I runed a test using same SQL script in SparkSQL, Shark and Hive > environment(Pure hive application rather than Spark HiveContext) as below > --- > select c.theyear, sum(b.amount) > from tblstock a > join tblStockDetail b on a.ordernumber = b.ordernumber > join tbldate c on a.dateid = c.dateid > group by c.theyear; > result of hive/shark: > theyear _c1 > 2004 1403018 > 2005 5557850 > 2006 7203061 > 2007 11300432 > 2008 12109328 > 2009 5365447 > 2010 188944 > result of SparkSQL: > 2010 210924 > 2004 3265696 > 2005 13247234 > 2006 13670416 > 2007 16711974 > 2008 14670698 > 2009 6322137 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4217) Result of SparkSQL is incorrect after a table join and group by operation
[ https://issues.apache.org/jira/browse/SPARK-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196674#comment-14196674 ] Venkata Ramana G commented on SPARK-4217: - I have executed this on Hive and SparkSQL. looks like your "Hive" result is wrong. I have got same result on SparkSQL and Hive SparkSQL [2010,210924] [2004,3265696] [2005,13247234] [2006,13670416] [2007,16711974] [2008,14670698] [2009,6322137] Hive 20043265696 200513247234 200613670416 200716711974 200814670698 20096322137 2010210924 > Result of SparkSQL is incorrect after a table join and group by operation > - > > Key: SPARK-4217 > URL: https://issues.apache.org/jira/browse/SPARK-4217 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 > Environment: Hadoop 2.2.0 > Spark1.1 >Reporter: peter.zhang >Priority: Critical > Attachments: TestScript.sql, saledata.zip > > > I runed a test using same SQL script in SparkSQL, Shark and Hive environment > as below > --- > select c.theyear, sum(b.amount) > from tblstock a > join tblStockDetail b on a.ordernumber = b.ordernumber > join tbldate c on a.dateid = c.dateid > group by c.theyear; > result of hive/shark: > theyear _c1 > 2004 1403018 > 2005 5557850 > 2006 7203061 > 2007 11300432 > 2008 12109328 > 2009 5365447 > 2010 188944 > result of SparkSQL: > 2010 210924 > 2004 3265696 > 2005 13247234 > 2006 13670416 > 2007 16711974 > 2008 14670698 > 2009 6322137 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4201) Can't use concat() on partition column in where condition (Hive compatibility problem)
[ https://issues.apache.org/jira/browse/SPARK-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194847#comment-14194847 ] Venkata Ramana G commented on SPARK-4201: - I found the same is working on latest master, please confirm. > Can't use concat() on partition column in where condition (Hive compatibility > problem) > -- > > Key: SPARK-4201 > URL: https://issues.apache.org/jira/browse/SPARK-4201 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.0.0, 1.1.0 > Environment: Hive 0.12+hadoop 2.4/hadoop 2.2 +spark 1.1 >Reporter: dongxu >Priority: Minor > Labels: com > > The team used hive to query,we try to move it to spark-sql. > when I search sentences like that. > select count(1) from gulfstream_day_driver_base_2 where > concat(year,month,day) = '20140929'; > It can't work ,but it work well in hive. > I have to rewrite the sql to "select count(1) from > gulfstream_day_driver_base_2 where year = 2014 and month = 09 day= 29. > There are some error log. > 14/11/03 15:05:03 ERROR SparkSQLDriver: Failed in [select count(1) from > gulfstream_day_driver_base_2 where concat(year,month,day) = '20140929'] > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: > Aggregate false, [], [SUM(PartialCount#1390L) AS c_0#1337L] > Exchange SinglePartition > Aggregate true, [], [COUNT(1) AS PartialCount#1390L] >HiveTableScan [], (MetastoreRelation default, > gulfstream_day_driver_base_2, None), > Some((HiveGenericUdf#org.apache.hadoop.hive.ql.udf.generic.GenericUDFConcat(year#1339,month#1340,day#1341) > = 20140929)) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) > at org.apache.spark.sql.execution.Aggregate.execute(Aggregate.scala:126) > at > org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:360) > at > org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360) > at > org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:415) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:59) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > execute, tree: > Exchange SinglePartition > Aggregate true, [], [COUNT(1) AS PartialCount#1390L] > HiveTableScan [], (MetastoreRelation default, gulfstream_day_driver_base_2, > None), > Some((HiveGenericUdf#org.apache.hadoop.hive.ql.udf.generic.GenericUDFConcat(year#1339,month#1340,day#1341) > = 20140929)) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) > at org.apache.spark.sql.execution.Exchange.execute(Exchange.scala:44) > at > org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1.apply(Aggregate.scala:128) > at > org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1.apply(Aggregate.scala:127) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46) > ... 16 more > Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > execute, tree: > Aggregate true, [], [COUNT(1) AS PartialCount#1390L] > HiveTableScan [], (MetastoreRelation default, gulfstream_day_driver_base_2, > None), > Some((HiveGenericUdf#org.apache.hadoop.hive.ql.udf.generic.GenericUDFConcat(year#1339,month#1340,day#1341) > = 20140929)) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) > at org.apache.spark.sql.execution.Aggregate.execute(Aggregate.scala:126) > at > org.apache.spark.sql.execution.Exchange$$anonfun$execute$1.apply(Exchange.scala:86) > at > org.apache.spark.sql.execution.Exchange$$anonfun$execute$1.apply(Exchange.scal
[jira] [Commented] (SPARK-4077) A broken string timestamp value can Spark SQL return wrong values for valid string timestamp values
[ https://issues.apache.org/jira/browse/SPARK-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191299#comment-14191299 ] Venkata Ramana G commented on SPARK-4077: - I could not find this behaviour for non text source. I case of runSqlHive it is serializing row by row, so this problem will not be visible. org.apache.hadoop.hive.ql.exec.FetchTask.fetch > A broken string timestamp value can Spark SQL return wrong values for valid > string timestamp values > --- > > Key: SPARK-4077 > URL: https://issues.apache.org/jira/browse/SPARK-4077 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Yin Huai >Assignee: Venkata Ramana G > > The following case returns wrong results. > The text file is > {code} > 2014-12-11 00:00:00,1 > 2014-12-11astring00:00:00,2 > {code} > The DDL statement and the query are shown below... > {code} > sql(""" > create external table date_test(my_date timestamp, id int) > row format delimited > fields terminated by ',' > lines terminated by '\n' > LOCATION 'dateTest' > """) > sql("select * from date_test").collect.foreach(println) > {code} > The result is > {code} > [1969-12-31 19:00:00.0,1] > [null,2] > {code} > If I change the data to > {code} > 2014-12-11 00:00:00,1 > 2014-12-11 00:00:00,2 > {code} > The result is fine. > For the data with broken string timestamp value, I tried runSqlHive. The > result is fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4077) A broken string timestamp value can Spark SQL return wrong values for valid string timestamp values
[ https://issues.apache.org/jira/browse/SPARK-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190020#comment-14190020 ] Venkata Ramana G edited comment on SPARK-4077 at 10/30/14 1:05 PM: --- In org.apache.hadoop.hive.serde2.io.TimestampWritable.set , if the next entry is null then current time stamp object is being reset. Not sure why it is done like that in hive. We also can raise a bug in hive. However because of this hiveinspectors:unwrap cannot use the same timestamp object without creating a copy. was (Author: gvramana): In org.apache.hadoop.hive.serde2.io.TimestampWritable.init , if the next entry is null then current time stamp object is being reset. Not sure why it is done like that in hive. We also can raise a bug in hive. However because of this hiveinspectors:unwrap cannot use the same timestamp object without creating a copy. > A broken string timestamp value can Spark SQL return wrong values for valid > string timestamp values > --- > > Key: SPARK-4077 > URL: https://issues.apache.org/jira/browse/SPARK-4077 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Yin Huai >Assignee: Venkata Ramana G > > The following case returns wrong results. > The text file is > {code} > 2014-12-11 00:00:00,1 > 2014-12-11astring00:00:00,2 > {code} > The DDL statement and the query are shown below... > {code} > sql(""" > create external table date_test(my_date timestamp, id int) > row format delimited > fields terminated by ',' > lines terminated by '\n' > LOCATION 'dateTest' > """) > sql("select * from date_test").collect.foreach(println) > {code} > The result is > {code} > [1969-12-31 19:00:00.0,1] > [null,2] > {code} > If I change the data to > {code} > 2014-12-11 00:00:00,1 > 2014-12-11 00:00:00,2 > {code} > The result is fine. > For the data with broken string timestamp value, I tried runSqlHive. The > result is fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4077) A broken string timestamp value can Spark SQL return wrong values for valid string timestamp values
[ https://issues.apache.org/jira/browse/SPARK-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190020#comment-14190020 ] Venkata Ramana G commented on SPARK-4077: - In org.apache.hadoop.hive.serde2.io.TimestampWritable.init , if the next entry is null then current time stamp object is being reset. Not sure why it is done like that in hive. We also can raise a bug in hive. However because of this hiveinspectors:unwrap cannot use the same timestamp object without creating a copy. > A broken string timestamp value can Spark SQL return wrong values for valid > string timestamp values > --- > > Key: SPARK-4077 > URL: https://issues.apache.org/jira/browse/SPARK-4077 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Yin Huai >Assignee: Venkata Ramana G > > The following case returns wrong results. > The text file is > {code} > 2014-12-11 00:00:00,1 > 2014-12-11astring00:00:00,2 > {code} > The DDL statement and the query are shown below... > {code} > sql(""" > create external table date_test(my_date timestamp, id int) > row format delimited > fields terminated by ',' > lines terminated by '\n' > LOCATION 'dateTest' > """) > sql("select * from date_test").collect.foreach(println) > {code} > The result is > {code} > [1969-12-31 19:00:00.0,1] > [null,2] > {code} > If I change the data to > {code} > 2014-12-11 00:00:00,1 > 2014-12-11 00:00:00,2 > {code} > The result is fine. > For the data with broken string timestamp value, I tried runSqlHive. The > result is fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3815) LPAD function does not work in where predicate
[ https://issues.apache.org/jira/browse/SPARK-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179813#comment-14179813 ] Venkata Ramana G commented on SPARK-3815: - Still the issue is not re-producible, both limit and without limit is working fine for me. As LIMIT triggers closure clean up, there must be some other reason for exception thrown during clean up. I think exact data set and script is required to reproduce. Check if this issue is any thing related to SPARK-3517 https://github.com/apache/spark/pull/2376 may be you can try that patch. > LPAD function does not work in where predicate > -- > > Key: SPARK-3815 > URL: https://issues.apache.org/jira/browse/SPARK-3815 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Yana Kadiyska >Priority: Minor > > select customer_id from mytable where > pkey=concat_ws('-',LPAD('077',4,'0'),'2014-07') LIMIT 2 > produces: > 14/10/03 14:51:35 ERROR server.SparkSQLOperationManager: Error executing > query: > org.apache.spark.SparkException: Task not serializable > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) > at > org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) > at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) > at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:597) > at > org.apache.spark.sql.execution.Limit.execute(basicOperators.scala:146) > at > org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:360) > at > org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360) > at > org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager$$anon$1.run(SparkSQLOperationManager.scala:185) > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:193) > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:175) > at > org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:150) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:207) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58) > at > org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526) > at > org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.NotSerializableException: java.lang.reflect.Constructor > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at java.io.ObjectOutputStream.writeObject(ObjectOut
[jira] [Commented] (SPARK-3815) LPAD function does not work in where predicate
[ https://issues.apache.org/jira/browse/SPARK-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14176226#comment-14176226 ] Venkata Ramana G commented on SPARK-3815: - I found this working fine on the latest release. [~yanakad] Can you please reverify? thanks > LPAD function does not work in where predicate > -- > > Key: SPARK-3815 > URL: https://issues.apache.org/jira/browse/SPARK-3815 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Yana Kadiyska >Priority: Minor > > select customer_id from mytable where > pkey=concat_ws('-',LPAD('077',4,'0'),'2014-07') LIMIT 2 > produces: > 14/10/03 14:51:35 ERROR server.SparkSQLOperationManager: Error executing > query: > org.apache.spark.SparkException: Task not serializable > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) > at > org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) > at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) > at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:597) > at > org.apache.spark.sql.execution.Limit.execute(basicOperators.scala:146) > at > org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:360) > at > org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360) > at > org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager$$anon$1.run(SparkSQLOperationManager.scala:185) > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:193) > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:175) > at > org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:150) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:207) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58) > at > org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526) > at > org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.NotSerializableException: java.lang.reflect.Constructor > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) > at scala.collection.immutable.$colon$colon.writeObject(List.scala:379) > at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > The following wo
[jira] [Commented] (SPARK-3891) Support Hive Percentile UDAF with array of percentile values
[ https://issues.apache.org/jira/browse/SPARK-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171019#comment-14171019 ] Venkata Ramana G commented on SPARK-3891: - Following problems need to be fixed to passing array to percentile and percentile_approx UDAFs 1. percentile UDAF the parameters are not wrapped before passing to UDAF 2. percentile_approx takes only constant inspector as parameter, so constant inspectors support needs to be added to GenericUDAF. > Support Hive Percentile UDAF with array of percentile values > > > Key: SPARK-3891 > URL: https://issues.apache.org/jira/browse/SPARK-3891 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0 > Environment: Spark 1.2.0 trunk > (ac302052870a650d56f2d3131c27755bb2960ad7) on > CDH 5.1.0 > Centos 6.5 > 8x 2GHz, 24GB RAM >Reporter: Anand Mohan Tumuluri >Assignee: Venkata Ramana G > > Spark PR 2620 brings in the support of Hive percentile UDAF. > However Hive percentile and percentile_approx UDAFs also support returning an > array of percentile values with the syntax > percentile(BIGINT col, array(p1 [, p2]...)) or > percentile_approx(DOUBLE col, array(p1 [, p2]...) [, B]) > These queries are failing with the below error: > 0: jdbc:hive2://dev-uuppala.sfohi.philips.com> select name, > percentile(turnaroundtime,array(0,0.25,0.5,0.75,1)) from exam group by name; > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 1 in stage 25.0 failed 4 times, most recent failure: Lost task 1.3 in > stage 25.0 (TID 305, Dev-uuppala.sfohi.philips.com): > java.lang.ClassCastException: scala.collection.mutable.ArrayBuffer cannot be > cast to [Ljava.lang.Object; > > org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getListLength(StandardListObjectInspector.java:83) > > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:259) > > org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ConversionHelper.convertIfNecessary(GenericUDFUtils.java:349) > > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.iterate(GenericUDAFBridge.java:170) > org.apache.spark.sql.hive.HiveUdafFunction.update(hiveUdfs.scala:342) > > org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:167) > > org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:151) > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599) > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > org.apache.spark.scheduler.Task.run(Task.scala:56) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > java.lang.Thread.run(Thread.java:745) > Driver stacktrace: (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2155) Support effectful / non-deterministic key expressions in CASE WHEN statements
[ https://issues.apache.org/jira/browse/SPARK-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169234#comment-14169234 ] Venkata Ramana G commented on SPARK-2155: - we can separate CASE KEY WHEN and CASE WHEN into two expressions and can have an common abstract base class containing common code. This will address redundant evaluation problem without duplicating code. > Support effectful / non-deterministic key expressions in CASE WHEN statements > - > > Key: SPARK-2155 > URL: https://issues.apache.org/jira/browse/SPARK-2155 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Zongheng Yang >Priority: Minor > > Currently we translate CASE KEY WHEN to CASE WHEN, hence incurring redundant > evaluations of the key expression. Relevant discussions here: > https://github.com/apache/spark/pull/1055/files#r13784248 > If we are very in need of support for effectful key expressions, at least we > can resort to the baseline approach of having both CaseWhen and CaseKeyWhen > as expressions, which seem to introduce much code duplication (e.g. see > https://github.com/concretevitamin/spark/blob/47d406a58d129e5bba68bfadf9dd1faa9054d834/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L216 > for a sketch implementation). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3892) Map type should have typeName
[ https://issues.apache.org/jira/browse/SPARK-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166724#comment-14166724 ] Venkata Ramana G commented on SPARK-3892: - Can you explain in detail? > Map type should have typeName > - > > Key: SPARK-3892 > URL: https://issues.apache.org/jira/browse/SPARK-3892 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Adrian Wang > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-3879) spark-shell.cmd fails giving error "!=x was unexpected at this time"
[ https://issues.apache.org/jira/browse/SPARK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Ramana G closed SPARK-3879. --- Resolution: Duplicate > spark-shell.cmd fails giving error "!=x was unexpected at this time" > > > Key: SPARK-3879 > URL: https://issues.apache.org/jira/browse/SPARK-3879 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: Windows >Reporter: Venkata Ramana G > > spark-shell.cmd giving error "!=x was unexpected at this time" > This problem is introduced during SPARK-2058 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3879) spark-shell.cmd fails giving error "!=x was unexpected at this time"
[ https://issues.apache.org/jira/browse/SPARK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165225#comment-14165225 ] Venkata Ramana G commented on SPARK-3879: - It was already fixed, under SPARK-3808. So can close this issue. > spark-shell.cmd fails giving error "!=x was unexpected at this time" > > > Key: SPARK-3879 > URL: https://issues.apache.org/jira/browse/SPARK-3879 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: Windows >Reporter: Venkata Ramana G > > spark-shell.cmd giving error "!=x was unexpected at this time" > This problem is introduced during SPARK-2058 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3879) spark-shell.cmd fails giving error "!=x was unexpected at this time"
[ https://issues.apache.org/jira/browse/SPARK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165209#comment-14165209 ] Venkata Ramana G commented on SPARK-3879: - I have fixed the same, about to submit PR. > spark-shell.cmd fails giving error "!=x was unexpected at this time" > > > Key: SPARK-3879 > URL: https://issues.apache.org/jira/browse/SPARK-3879 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: Windows >Reporter: Venkata Ramana G > > spark-shell.cmd giving error "!=x was unexpected at this time" > This problem is introduced during SPARK-2058 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3879) spark-shell.cmd fails giving error "!=x was unexpected at this time"
Venkata Ramana G created SPARK-3879: --- Summary: spark-shell.cmd fails giving error "!=x was unexpected at this time" Key: SPARK-3879 URL: https://issues.apache.org/jira/browse/SPARK-3879 Project: Spark Issue Type: Bug Components: Spark Shell Environment: Windows Reporter: Venkata Ramana G spark-shell.cmd giving error "!=x was unexpected at this time" This problem is introduced during SPARK-2058 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3845) SQLContext(...) should inherit configurations from SparkContext
[ https://issues.apache.org/jira/browse/SPARK-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163367#comment-14163367 ] Venkata Ramana G commented on SPARK-3845: - As I understand that is the way, it works. All the configuration options that start with "spark.sql" will be copied into SQLContext from SparkContext > SQLContext(...) should inherit configurations from SparkContext > --- > > Key: SPARK-3845 > URL: https://issues.apache.org/jira/browse/SPARK-3845 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.1.0 >Reporter: Jianshi Huang > > It's very confusing that Spark configurations (e.g. spark.serializer, > spark.speculation, etc.) can be set in the spark-default.conf file, while > SparkSQL configurations (e..g spark.sql.inMemoryColumnarStorage.compressed, > spark.sql.codegen, etc.) has to be set either in sqlContext.setConf or > sql("SET ..."). > When I do: > val sqlContext = new org.apache.spark.sql.SQLContext(sparkContext) > I would expect sqlContext recognizes all the SQL configurations comes with > sparkContext. > Jianshi -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3559) appendReadColumnIDs and appendReadColumnNames introduce unnecessary columns in the lists of needed column ids and column names stored in hiveConf
[ https://issues.apache.org/jira/browse/SPARK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163218#comment-14163218 ] Venkata Ramana G commented on SPARK-3559: - As same hiveConf is used across queries columns get appended, and cannot be controlled to send only required columns. HiveConf can be cloned at TableScanOperator and configure required properties. deserializers are expecting this property to be set in HiveConf but not in table-specific properties. > appendReadColumnIDs and appendReadColumnNames introduce unnecessary columns > in the lists of needed column ids and column names stored in hiveConf > - > > Key: SPARK-3559 > URL: https://issues.apache.org/jira/browse/SPARK-3559 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yin Huai >Priority: Blocker > > Because we are using the same hiveConf and we are currently using > ColumnProjectionUtils.appendReadColumnIDs > ColumnProjectionUtils.appendReadColumnNames to append needed column ids and > names for a table, lists of needed column ids and names can have unnecessary > columns. > Also, for a join operation, TableScanOperators for both tables are sharing > the same hiveConf and they may need to set table-specific properties. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-3034) [HIve] java.sql.Date cannot be cast to java.sql.Timestamp
[ https://issues.apache.org/jira/browse/SPARK-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Ramana G updated SPARK-3034: Comment: was deleted (was: Requires adding another data type DateType Modifications required in parser, datatype addition and DataType conversion to and from TimeStamp and String. Compatibility with Date supported in Hive 0.12.0. Date UDFs compatibility. Started working on the same.) > [HIve] java.sql.Date cannot be cast to java.sql.Timestamp > - > > Key: SPARK-3034 > URL: https://issues.apache.org/jira/browse/SPARK-3034 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.0.2 >Reporter: pengyanhong >Priority: Blocker > > run a simple HiveQL via yarn-cluster, got error as below: > {quote} > Exception in thread "Thread-2" java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:199) > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0.0:127 failed 3 times, most recent failure: Exception failure in TID > 141 on host A01-R06-I147-41.jd.local: java.lang.ClassCastException: > java.sql.Date cannot be cast to java.sql.Timestamp > > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaTimestampObjectInspector.getPrimitiveWritableObject(JavaTimestampObjectInspector.java:33) > > org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:251) > > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486) > > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439) > > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423) > > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$3$$anonfun$apply$1.apply(InsertIntoHiveTable.scala:200) > > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$3$$anonfun$apply$1.apply(InsertIntoHiveTable.scala:192) > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:149) > > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158) > > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) > org.apache.spark.scheduler.Task.run(Task.scala:51) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > java.lang.Thread.run(Thread.java:662) > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJ
[jira] [Commented] (SPARK-3034) [HIve] java.sql.Date cannot be cast to java.sql.Timestamp
[ https://issues.apache.org/jira/browse/SPARK-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159998#comment-14159998 ] Venkata Ramana G commented on SPARK-3034: - Requires adding another data type DateType Modifications required in parser, datatype addition and DataType conversion to and from TimeStamp and String. Compatibility with Date supported in Hive 0.12.0. Date UDFs compatibility. Started working on the same. > [HIve] java.sql.Date cannot be cast to java.sql.Timestamp > - > > Key: SPARK-3034 > URL: https://issues.apache.org/jira/browse/SPARK-3034 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.0.2 >Reporter: pengyanhong >Priority: Blocker > > run a simple HiveQL via yarn-cluster, got error as below: > {quote} > Exception in thread "Thread-2" java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:199) > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0.0:127 failed 3 times, most recent failure: Exception failure in TID > 141 on host A01-R06-I147-41.jd.local: java.lang.ClassCastException: > java.sql.Date cannot be cast to java.sql.Timestamp > > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaTimestampObjectInspector.getPrimitiveWritableObject(JavaTimestampObjectInspector.java:33) > > org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:251) > > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486) > > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439) > > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423) > > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$3$$anonfun$apply$1.apply(InsertIntoHiveTable.scala:200) > > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$3$$anonfun$apply$1.apply(InsertIntoHiveTable.scala:192) > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:149) > > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158) > > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) > org.apache.spark.scheduler.Task.run(Task.scala:51) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > java.lang.Thread.run(Thread.java:662) > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) >
[jira] [Commented] (SPARK-3593) Support Sorting of Binary Type Data
[ https://issues.apache.org/jira/browse/SPARK-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154863#comment-14154863 ] Venkata Ramana G commented on SPARK-3593: - BinaryType is currently not derived under NativeType and does not have Ordering Support. So BinaryType can be moved under NativeType as it already has JvmType defined, and required to implement Ordering. Hive also identifies BinaryType these types under Primitive Types keeping other complex types like Arrays,Maps,Structs and union as Complex Types. This is similar to current TimestampType handling. > Support Sorting of Binary Type Data > --- > > Key: SPARK-3593 > URL: https://issues.apache.org/jira/browse/SPARK-3593 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.1.0 >Reporter: Paul Magid > > If you try sorting on a binary field you currently get an exception. Please > add support for binary data type sorting. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3268) DoubleType should support modulus
[ https://issues.apache.org/jira/browse/SPARK-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138763#comment-14138763 ] Venkata Ramana G commented on SPARK-3268: - Problem: double, decimal and float are identified under fractional type hierarchy as per Scala, % (rem) function is implemented under integral type hierarchy. Currently there is no mechanism exists to allow fractional type using integral operations. Solution: Scala provides overridden classes like DoubleAsIfIntegral, FloatAsIfIntegral, BigDecimalAsIfIntegral to allow these types to work with integral operators. So we can add asIntegral to FractionalType to support calling Integral related functions. i2 function can call functions on asIntegral to execute the same. Implemented the same, writing test cases is pending, will submit the patch for the same. > DoubleType should support modulus > -- > > Key: SPARK-3268 > URL: https://issues.apache.org/jira/browse/SPARK-3268 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Chris Grier >Priority: Minor > > Using the modulus operator (%) on Doubles throws and exception. > eg: > SELECT 1388632775.0 % 60 from tablename LIMIT 1 > Throws: > java.lang.Exception: Type DoubleType does not support numeric operations -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2189) Method for removing temp tables created by registerAsTable
[ https://issues.apache.org/jira/browse/SPARK-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110740#comment-14110740 ] Venkata Ramana G commented on SPARK-2189: - unregisterTempTable("cachedTableName") this api should uncache the registered tables with InMemoryRelation. I could not get any useful use case where cache still is required after unregisterTempTable. If there is any valid usecase, then api can be modified to give more control to user, unregisterTempTable("cachedTableName", unCacheTables=true) . this api by default should uncache the registered tables with InMemoryRelation. However user can pass unCacheTables=false to change the behaviour. Please comment. > Method for removing temp tables created by registerAsTable > -- > > Key: SPARK-2189 > URL: https://issues.apache.org/jira/browse/SPARK-2189 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.0.0 >Reporter: Michael Armbrust > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2189) Method for removing temp tables created by registerAsTable
[ https://issues.apache.org/jira/browse/SPARK-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108768#comment-14108768 ] Venkata Ramana G commented on SPARK-2189: - Please assign this to me. > Method for removing temp tables created by registerAsTable > -- > > Key: SPARK-2189 > URL: https://issues.apache.org/jira/browse/SPARK-2189 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.0.0 >Reporter: Michael Armbrust > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org