[jira] [Commented] (SPARK-5680) Sum function on all null values, should return zero

2015-06-15 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587473#comment-14587473
 ] 

Venkata Ramana G commented on SPARK-5680:
-

Holman, You are right that column with all NULL values should return NULL.
As my motivation was to fix udaf_number_format.q, "select sum('a') from src" 
returns 0 in hive, mysql.
 and "select cast('a' as double) from src" returned NULL in hive.
I assumed or rather wrongly analysed it as "Sum of ALL NULLs return 0" and this 
has introduced the problem.
I apologize for this and will submit the patch to revert that fix. 

"select sum('a') from src" returning 0 in hive and mysql created this 
confusion, is still not clear.


> Sum function on all null values, should return zero
> ---
>
> Key: SPARK-5680
> URL: https://issues.apache.org/jira/browse/SPARK-5680
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Venkata Ramana G
>Assignee: Venkata Ramana G
>Priority: Minor
> Fix For: 1.3.1, 1.4.0
>
>
> SELECT  sum('a'),  avg('a'),  variance('a'),  std('a') FROM src;
> Current output:
> NULL  NULLNULLNULL
> Expected output:
> 0.0   NULLNULLNULL
> This fixes hive udaf_number_format.q 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7646) Create table support to JDBC Datasource

2015-05-14 Thread Venkata Ramana G (JIRA)
Venkata Ramana G created SPARK-7646:
---

 Summary: Create table support to JDBC Datasource
 Key: SPARK-7646
 URL: https://issues.apache.org/jira/browse/SPARK-7646
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Venkata Ramana G


Support Create table into JDBCDataSource. Following are usage examples
{code}
df.saveAsTable(
"testcreate2",
"org.apache.spark.sql.jdbc",
 org.apache.spark.sql.SaveMode.Overwrite,
 Map("url"->s"$url", "dbtable"->"testcreate2", "user"->"xx", "password"->"xx", 
"driver"->"com.h2.Driver")
)
{code}
 if table doesnot exists, this should create a table and write dataframe 
content to table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7601) Support Insert into JDBC Datasource

2015-05-13 Thread Venkata Ramana G (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Ramana G updated SPARK-7601:

Description: 
Support Insert into JDBCDataSource. Following are usage examples
{code}
sqlContext.sql(
  s"""
|CREATE TEMPORARY TABLE testram1
|USING org.apache.spark.sql.jdbc
|OPTIONS (url '$url', dbtable 'testram1', user 'xx', password 'xx', 
driver 'com.h2.Driver')
  """.stripMargin.replaceAll("\n", " "))

sqlContext.sql("insert into table testram1 select * from testsrc").show
{code}

  was:
Support Insert into JDBCDataSource. Following are usage examples
{code}
df.saveAsTable(
"testcreate2",
"org.apache.spark.sql.jdbc",
 org.apache.spark.sql.SaveMode.Overwrite,
 Map("url"->s"$url", "dbtable"->"testcreate2", "user"->"xx", "password"->"xx", 
"driver"->"com.h2.Driver")
)

or 

sqlContext.sql(
  s"""
|CREATE TEMPORARY TABLE testram1
|USING org.apache.spark.sql.jdbc
|OPTIONS (url '$url', dbtable 'testram1', user 'xx', password 'xx', 
driver 'com.h2.Driver')
  """.stripMargin.replaceAll("\n", " "))

sqlContext.sql("insert into table testram1 select * from testsrc").show
{code}


> Support Insert into JDBC Datasource
> ---
>
> Key: SPARK-7601
> URL: https://issues.apache.org/jira/browse/SPARK-7601
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.1
>Reporter: Venkata Ramana G
>
> Support Insert into JDBCDataSource. Following are usage examples
> {code}
> sqlContext.sql(
>   s"""
> |CREATE TEMPORARY TABLE testram1
> |USING org.apache.spark.sql.jdbc
> |OPTIONS (url '$url', dbtable 'testram1', user 'xx', password 'xx', 
> driver 'com.h2.Driver')
>   """.stripMargin.replaceAll("\n", " "))
> sqlContext.sql("insert into table testram1 select * from testsrc").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7601) Support Insert into JDBC Datasource

2015-05-13 Thread Venkata Ramana G (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Ramana G updated SPARK-7601:

Description: 
Support Insert into JDBCDataSource. Following are usage examples
{code}
df.saveAsTable(
"testcreate2",
"org.apache.spark.sql.jdbc",
 org.apache.spark.sql.SaveMode.Overwrite,
 Map("url"->s"$url", "dbtable"->"testcreate2", "user"->"xx", "password"->"xx", 
"driver"->"com.h2.Driver")
)

or 

sqlContext.sql(
  s"""
|CREATE TEMPORARY TABLE testram1
|USING org.apache.spark.sql.jdbc
|OPTIONS (url '$url', dbtable 'testram1', user 'xx', password 'xx', 
driver 'com.h2.Driver')
  """.stripMargin.replaceAll("\n", " "))

sqlContext.sql("insert into table testram1 select * from testsrc").show
{code}

  was:
Support Insert into JDBCDataSource. Following are usage examples
{code}
df.saveAsTable("testcreate2","org.apache.spark.sql.jdbc", 
org.apache.spark.sql.SaveMode.Overwrite, Map("url"->s"$url", 
"dbtable"->"testcreate2", "user"->"xx", "password"->"xx", 
"driver"->"com.h2.Driver"))

or 

sqlContext.sql(
  s"""
|CREATE TEMPORARY TABLE testram1
|USING org.apache.spark.sql.jdbc
|OPTIONS (url '$url', dbtable 'testram1', user 'xx', password 'xx', 
driver 'com.h2.Driver')
  """.stripMargin.replaceAll("\n", " "))

sqlContext.sql("insert into table testram1 select * from testsrc").show
{code}


> Support Insert into JDBC Datasource
> ---
>
> Key: SPARK-7601
> URL: https://issues.apache.org/jira/browse/SPARK-7601
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.1
>Reporter: Venkata Ramana G
>
> Support Insert into JDBCDataSource. Following are usage examples
> {code}
> df.saveAsTable(
> "testcreate2",
> "org.apache.spark.sql.jdbc",
>  org.apache.spark.sql.SaveMode.Overwrite,
>  Map("url"->s"$url", "dbtable"->"testcreate2", "user"->"xx", 
> "password"->"xx", "driver"->"com.h2.Driver")
> )
> or 
> sqlContext.sql(
>   s"""
> |CREATE TEMPORARY TABLE testram1
> |USING org.apache.spark.sql.jdbc
> |OPTIONS (url '$url', dbtable 'testram1', user 'xx', password 'xx', 
> driver 'com.h2.Driver')
>   """.stripMargin.replaceAll("\n", " "))
> sqlContext.sql("insert into table testram1 select * from testsrc").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7601) Support Insert into JDBC Datasource

2015-05-13 Thread Venkata Ramana G (JIRA)
Venkata Ramana G created SPARK-7601:
---

 Summary: Support Insert into JDBC Datasource
 Key: SPARK-7601
 URL: https://issues.apache.org/jira/browse/SPARK-7601
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1
Reporter: Venkata Ramana G


Support Insert into JDBCDataSource. Following are usage examples
{code}
df.saveAsTable("testcreate2","org.apache.spark.sql.jdbc", 
org.apache.spark.sql.SaveMode.Overwrite, Map("url"->s"$url", 
"dbtable"->"testcreate2", "user"->"xx", "password"->"xx", 
"driver"->"com.h2.Driver"))

or 

sqlContext.sql(
  s"""
|CREATE TEMPORARY TABLE testram1
|USING org.apache.spark.sql.jdbc
|OPTIONS (url '$url', dbtable 'testram1', user 'xx', password 'xx', 
driver 'com.h2.Driver')
  """.stripMargin.replaceAll("\n", " "))

sqlContext.sql("insert into table testram1 select * from testsrc").show
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7484) Support passing jdbc connection properties for dataframe.createJDBCTable and insertIntoJDBC

2015-05-08 Thread Venkata Ramana G (JIRA)
Venkata Ramana G created SPARK-7484:
---

 Summary: Support passing jdbc connection properties for 
dataframe.createJDBCTable and insertIntoJDBC
 Key: SPARK-7484
 URL: https://issues.apache.org/jira/browse/SPARK-7484
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1
Reporter: Venkata Ramana G
Priority: Minor


Few jdbc drivers like SybaseIQ support passing username and password only 
through connection properties. So the same needs to be supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6451) Support CombineSum in Code Gen

2015-03-22 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375456#comment-14375456
 ] 

Venkata Ramana G commented on SPARK-6451:
-

Working on the same.

> Support CombineSum in Code Gen
> --
>
> Key: SPARK-6451
> URL: https://issues.apache.org/jira/browse/SPARK-6451
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>Priority: Blocker
>
> Since we are using CombineSum at the reducer side for the SUM function, we 
> need to make it work in code gen. Otherwise, code gen will not convert 
> Aggregates with a SUM function to GeneratedAggregates (the code gen version).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5818) unable to use "add jar" in hql

2015-03-18 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366811#comment-14366811
 ] 

Venkata Ramana G commented on SPARK-5818:
-

TranslatingClassLoader is used for Spark-shell, while current hive's add jar 
can work only with URLClassLoader.

So jar has to be directly added to spark driver's class loader or its parent 
loader, in case of spark-shell
I am working the same.

> unable to use "add jar" in hql
> --
>
> Key: SPARK-5818
> URL: https://issues.apache.org/jira/browse/SPARK-5818
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0, 1.2.1
>Reporter: pengxu
>
> In the spark 1.2.1 and 1.2.0, it's unable the use the hive command "add jar"  
> in hql.
> It seems that the problem in spark-2219 is still existed.
> the problem can be reproduced as described in the below. Suppose the jar file 
> is named brickhouse-0.6.0.jar and is placed in the /tmp directory
> {code}
> spark-shell>import org.apache.spark.sql.hive._
> spark-shell>val sqlContext = new HiveContext(sc)
> spark-shell>import sqlContext._
> spark-shell>hql("add jar /tmp/brickhouse-0.6.0.jar")
> {code}
> the error message is showed as blow
> {code:title=Error Log}
> 15/02/15 01:36:31 ERROR SessionState: Unable to register 
> /tmp/brickhouse-0.6.0.jar
> Exception: org.apache.spark.repl.SparkIMain$TranslatingClassLoader cannot be 
> cast to java.net.URLClassLoader
> java.lang.ClassCastException: 
> org.apache.spark.repl.SparkIMain$TranslatingClassLoader cannot be cast to 
> java.net.URLClassLoader
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.addToClassPath(Utilities.java:1921)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.registerJar(SessionState.java:599)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState$ResourceType$2.preHook(SessionState.java:658)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.add_resource(SessionState.java:732)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.add_resource(SessionState.java:717)
>   at 
> org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:54)
>   at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:319)
>   at 
> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276)
>   at 
> org.apache.spark.sql.hive.execution.AddJar.sideEffectResult$lzycompute(commands.scala:74)
>   at 
> org.apache.spark.sql.hive.execution.AddJar.sideEffectResult(commands.scala:73)
>   at 
> org.apache.spark.sql.execution.Command$class.execute(commands.scala:46)
>   at org.apache.spark.sql.hive.execution.AddJar.execute(commands.scala:68)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
>   at 
> org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
>   at org.apache.spark.sql.SchemaRDD.(SchemaRDD.scala:108)
>   at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:102)
>   at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:106)
>   at 
> $line30.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:24)
>   at 
> $line30.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:29)
>   at 
> $line30.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31)
>   at $line30.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33)
>   at $line30.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:35)
>   at $line30.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:37)
>   at $line30.$read$$iwC$$iwC$$iwC$$iwC.(:39)
>   at $line30.$read$$iwC$$iwC$$iwC.(:41)
>   at $line30.$read$$iwC$$iwC.(:43)
>   at $line30.$read$$iwC.(:45)
>   at $line30.$read.(:47)
>   at $line30.$read$.(:51)
>   at $line30.$read$.()
>   at $line30.$eval$.(:7)
>   at $line30.$eval$.()
>   at $line30.$eval.$print()
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)
>   at 
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)
>   at 
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)
>   at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)
>   at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)
>   at 
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)
>   at 
> org.apache.spark.repl.S

[jira] [Created] (SPARK-5765) word split problem in run-example and compute-classpath

2015-02-12 Thread Venkata Ramana G (JIRA)
Venkata Ramana G created SPARK-5765:
---

 Summary: word split problem in run-example and compute-classpath
 Key: SPARK-5765
 URL: https://issues.apache.org/jira/browse/SPARK-5765
 Project: Spark
  Issue Type: Bug
  Components: Examples
Affects Versions: 1.2.1, 1.3.0, 1.1.2
Reporter: Venkata Ramana G


Work split problem in spark directory path in scripts
run-example and compute-classpath.sh

This was introduced in defect fix SPARK-4504



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5680) Sum function on all null values, should return zero

2015-02-08 Thread Venkata Ramana G (JIRA)
Venkata Ramana G created SPARK-5680:
---

 Summary: Sum function on all null values, should return zero
 Key: SPARK-5680
 URL: https://issues.apache.org/jira/browse/SPARK-5680
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Venkata Ramana G
Priority: Minor


SELECT  sum('a'),  avg('a'),  variance('a'),  std('a') FROM src;
Current output:
NULLNULLNULLNULL
Expected output:
0.0 NULLNULLNULL

This fixes hive udaf_number_format.q 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4504) run-example fails if multiple example jars present in target folder

2014-11-19 Thread Venkata Ramana G (JIRA)
Venkata Ramana G created SPARK-4504:
---

 Summary: run-example fails if multiple example jars present in 
target folder
 Key: SPARK-4504
 URL: https://issues.apache.org/jira/browse/SPARK-4504
 Project: Spark
  Issue Type: Bug
  Components: Examples
Affects Versions: 1.1.0, 1.2.0
Reporter: Venkata Ramana G
Priority: Minor


Giving following error:

bin/run-example: line 39: [: 
/mnt/d/spark/spark/examples/target/scala-2.10/spark-examples-1.1.0-SNAPSHOT-hadoop1.0.4.jar:
 binary operator expected
Failed to find Spark examples assembly in /mnt/d/spark/spark/lib or 
/mnt/d/spark/spark/examples/target
You need to build Spark before running this program



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4296) Throw "Expression not in GROUP BY" when using same expression in group by clause and select clause

2014-11-09 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204263#comment-14204263
 ] 

Venkata Ramana G commented on SPARK-4296:
-

Alias are being added implicitly to structure fields in group by for aggregate 
expressions, so aggregate expression comparison to group by expression is 
failing.
So Upper(birthday#11.date AS date#17) is compared against 
Upper(birthday#11.date)

> Throw "Expression not in GROUP BY" when using same expression in group by 
> clause and  select clause
> ---
>
> Key: SPARK-4296
> URL: https://issues.apache.org/jira/browse/SPARK-4296
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Shixiong Zhu
>
> When the input data has a complex structure, using same expression in group 
> by clause and  select clause will throw "Expression not in GROUP BY".
> {code:java}
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext.createSchemaRDD
> case class Birthday(date: String)
> case class Person(name: String, birthday: Birthday)
> val people = sc.parallelize(List(Person("John", Birthday("1990-01-22")), 
> Person("Jim", Birthday("1980-02-28"
> people.registerTempTable("people")
> val year = sqlContext.sql("select count(*), upper(birthday.date) from people 
> group by upper(birthday.date)")
> year.collect
> {code}
> Here is the plan of year:
> {code:java}
> SchemaRDD[3] at RDD at SchemaRDD.scala:105
> == Query Plan ==
> == Physical Plan ==
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Expression 
> not in GROUP BY: Upper(birthday#1.date AS date#9) AS c1#3, tree:
> Aggregate [Upper(birthday#1.date)], [COUNT(1) AS c0#2L,Upper(birthday#1.date 
> AS date#9) AS c1#3]
>  Subquery people
>   LogicalRDD [name#0,birthday#1], MapPartitionsRDD[1] at mapPartitions at 
> ExistingRDD.scala:36
> {code}
> The bug is the equality test for `Upper(birthday#1.date)` and 
> `Upper(birthday#1.date AS date#9)`.
> Maybe Spark SQL needs a mechanism to compare Alias expression and non-Alias 
> expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4263) PERCENTILE is not working

2014-11-05 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199741#comment-14199741
 ] 

Venkata Ramana G commented on SPARK-4263:
-

Key looks like string type column according to error.
Array type parameter to percentile is submitted under PR 
https://github.com/apache/spark/pull/2802 for [SPARK-3891]
The same is added as part of its testcase.

> PERCENTILE is not working
> -
>
> Key: SPARK-4263
> URL: https://issues.apache.org/jira/browse/SPARK-4263
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Cheng Hao
>Priority: Minor
>
> When query "select percentile(key, array(0, 0.5,1)) from src", it will throws 
> exception like:
> {panel}
> org.apache.hadoop.hive.ql.exec.NoMatchingMethodException: No matching method 
> for class org.apache.hadoop.hive.ql.udf.UDAFPercentile with (string, 
> array). Possible choices: _FUNC_(bigint, array)  
> _FUNC_(bigint, double)  
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getMethodInternal(FunctionRegistry.java:1213)
>   at 
> org.apache.hadoop.hive.ql.exec.DefaultUDAFEvaluatorResolver.getEvaluatorClass(DefaultUDAFEvaluatorResolver.java:84)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge.getEvaluator(GenericUDAFBridge.java:56)
>   at 
> org.apache.spark.sql.hive.HiveUdaf.objectInspector$lzycompute(hiveUdfs.scala:234)
>   at 
> org.apache.spark.sql.hive.HiveUdaf.objectInspector(hiveUdfs.scala:233)
>   at org.apache.spark.sql.hive.HiveUdaf.dataType(hiveUdfs.scala:241)
>   at 
> org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:104)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.Aggregate$$anonfun$output$6.apply(basicOperators.scala:143)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.Aggregate$$anonfun$output$6.apply(basicOperators.scala:143)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.Aggregate.output(basicOperators.scala:143)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.Limit.output(basicOperators.scala:147)
>   at 
> org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$unapply$1.apply(patterns.scala:61)
>   at 
> org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$unapply$1.apply(patterns.scala:61)
>   at scala.Option.getOrElse(Option.scala:120)
>   at 
> org.apache.spark.sql.catalyst.planning.PhysicalOperation$.unapply(patterns.scala:61)
>   at 
> org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:34)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
>   at 
> org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:425)
>   at 
> org.apache.spark.sql.hive.thriftserver.AbstractSparkSQLDriver.run(AbstractSparkSQLDriver.scala:59)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:276)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:211)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4234) Always do paritial aggregation

2014-11-05 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198694#comment-14198694
 ] 

Venkata Ramana G commented on SPARK-4234:
-

As I understand currently HiveUDAF and HiveGenericUDAF support only 
AggregateExpression, but doesnot support partial Aggregation.
So we should support PartialAggregate for HiveUDAF and HiveGenericUDAF , using 
Hive UDAF's paritial aggregation interfaces.
Please correct my understanding.

> Always do paritial aggregation 
> ---
>
> Key: SPARK-4234
> URL: https://issues.apache.org/jira/browse/SPARK-4234
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Cheng Hao
>
> Currently, UDAF developer optionally implement a partial aggregation 
> function, However this probably cause performance issue by allowing do that. 
> We actually can always force developers to provide the partial aggregation 
> function as Hive does, hence we will always get the `mapside` aggregation 
> optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4252) SparkSQL behaves differently from Hive when encountering illegal record

2014-11-05 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198570#comment-14198570
 ] 

Venkata Ramana G commented on SPARK-4252:
-

Same When i executed over hive 0.12 from hive command line is giving result
hive> select * from user;
OK
Alice   12
Bob 13


> SparkSQL behaves differently from Hive when encountering illegal record
> ---
>
> Key: SPARK-4252
> URL: https://issues.apache.org/jira/browse/SPARK-4252
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: patrickliu
>
> Hive will ignore illegal record, while SparkSQL will try to convert illegal 
> record.
> Assume I have a text file user.txt with 2 records(userName, age):
> Alice,12.4
> Bob,13
> Then I create a Hive table to query the data:
> CREATE TABLE user(
> name string,
> age int, (Pay attention! The field is int)
> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ;
> LOAD DATA LOCAL INPATH 'user' INTO TABLE user;
> Then I use Hive and SparkSQL to query the 'user' table:
> SQL: select * from user;
> Result by Hive:
> Alice NULL( Hive ignore Alice's age because it is a float number )
> Bob 13
> Result by SparkSQL:
> Alice 12 ( SparkSQL converts Alice's age from float to int )
> Bob 13
> So if I run, "select sum(age) from user;"
> Then I will get different result.
> Maybe SparkSQL should be compatible with Hive in this scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3593) Support Sorting of Binary Type Data

2014-11-05 Thread Venkata Ramana G (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Ramana G resolved SPARK-3593.
-
Resolution: Fixed

> Support Sorting of Binary Type Data
> ---
>
> Key: SPARK-3593
> URL: https://issues.apache.org/jira/browse/SPARK-3593
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Paul Magid
>Assignee: Venkata Ramana G
> Fix For: 1.2.0
>
>
> If you try sorting on a binary field you currently get an exception.   Please 
> add support for binary data type sorting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-3593) Support Sorting of Binary Type Data

2014-11-05 Thread Venkata Ramana G (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Ramana G reopened SPARK-3593:
-

Reopened to assign to myself

> Support Sorting of Binary Type Data
> ---
>
> Key: SPARK-3593
> URL: https://issues.apache.org/jira/browse/SPARK-3593
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Paul Magid
> Fix For: 1.2.0
>
>
> If you try sorting on a binary field you currently get an exception.   Please 
> add support for binary data type sorting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4217) Result of SparkSQL is incorrect after a table join and group by operation

2014-11-04 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197799#comment-14197799
 ] 

Venkata Ramana G commented on SPARK-4217:
-

I executed them on Hive 0.12 (from Hive command line) and Spark SQL latest 
master (from spark shell using Hive Context connecting to Hive0.12)


> Result of SparkSQL is incorrect after a table join and group by operation
> -
>
> Key: SPARK-4217
> URL: https://issues.apache.org/jira/browse/SPARK-4217
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
> Environment: Hadoop 2.2.0
> Spark1.1
>Reporter: peter.zhang
>Priority: Critical
> Attachments: TestScript.sql, saledata.zip
>
>
> I runed a test using same SQL script in SparkSQL, Shark and Hive 
> environment(Pure hive application rather than Spark HiveContext) as below
> ---
> select c.theyear, sum(b.amount)
> from tblstock a
> join tblStockDetail b on a.ordernumber = b.ordernumber
> join tbldate c on a.dateid = c.dateid
> group by c.theyear;
> result of hive/shark:
> theyear   _c1
> 2004  1403018
> 2005  5557850
> 2006  7203061
> 2007  11300432
> 2008  12109328
> 2009  5365447
> 2010  188944
> result of SparkSQL:
> 2010  210924
> 2004  3265696
> 2005  13247234
> 2006  13670416
> 2007  16711974
> 2008  14670698
> 2009  6322137



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4217) Result of SparkSQL is incorrect after a table join and group by operation

2014-11-04 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196674#comment-14196674
 ] 

Venkata Ramana G commented on SPARK-4217:
-

I have executed this on Hive and SparkSQL.
looks like your "Hive" result is wrong.
I have got same result on SparkSQL and Hive

SparkSQL
[2010,210924]
[2004,3265696]
[2005,13247234]
[2006,13670416]
[2007,16711974]
[2008,14670698]
[2009,6322137]

Hive
20043265696
200513247234
200613670416
200716711974
200814670698
20096322137
2010210924

> Result of SparkSQL is incorrect after a table join and group by operation
> -
>
> Key: SPARK-4217
> URL: https://issues.apache.org/jira/browse/SPARK-4217
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
> Environment: Hadoop 2.2.0
> Spark1.1
>Reporter: peter.zhang
>Priority: Critical
> Attachments: TestScript.sql, saledata.zip
>
>
> I runed a test using same SQL script in SparkSQL, Shark and Hive environment 
> as below
> ---
> select c.theyear, sum(b.amount)
> from tblstock a
> join tblStockDetail b on a.ordernumber = b.ordernumber
> join tbldate c on a.dateid = c.dateid
> group by c.theyear;
> result of hive/shark:
> theyear   _c1
> 2004  1403018
> 2005  5557850
> 2006  7203061
> 2007  11300432
> 2008  12109328
> 2009  5365447
> 2010  188944
> result of SparkSQL:
> 2010  210924
> 2004  3265696
> 2005  13247234
> 2006  13670416
> 2007  16711974
> 2008  14670698
> 2009  6322137



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4201) Can't use concat() on partition column in where condition (Hive compatibility problem)

2014-11-03 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194847#comment-14194847
 ] 

Venkata Ramana G commented on SPARK-4201:
-

I found the same is working on latest master, please confirm.

> Can't use concat() on partition column in where condition (Hive compatibility 
> problem)
> --
>
> Key: SPARK-4201
> URL: https://issues.apache.org/jira/browse/SPARK-4201
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.0, 1.1.0
> Environment: Hive 0.12+hadoop 2.4/hadoop 2.2 +spark 1.1
>Reporter: dongxu
>Priority: Minor
>  Labels: com
>
> The team used hive to query,we try to  move it to spark-sql.
> when I search sentences like that. 
> select count(1) from  gulfstream_day_driver_base_2 where  
> concat(year,month,day) = '20140929';
> It can't work ,but it work well in hive.
> I have to rewrite the sql to  "select count(1) from  
> gulfstream_day_driver_base_2 where  year = 2014 and  month = 09 day= 29.
> There are some error log.
> 14/11/03 15:05:03 ERROR SparkSQLDriver: Failed in [select count(1) from  
> gulfstream_day_driver_base_2 where  concat(year,month,day) = '20140929']
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
> Aggregate false, [], [SUM(PartialCount#1390L) AS c_0#1337L]
>  Exchange SinglePartition
>   Aggregate true, [], [COUNT(1) AS PartialCount#1390L]
>HiveTableScan [], (MetastoreRelation default, 
> gulfstream_day_driver_base_2, None), 
> Some((HiveGenericUdf#org.apache.hadoop.hive.ql.udf.generic.GenericUDFConcat(year#1339,month#1340,day#1341)
>  = 20140929))
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
>   at org.apache.spark.sql.execution.Aggregate.execute(Aggregate.scala:126)
>   at 
> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:360)
>   at 
> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360)
>   at 
> org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:415)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:59)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
> execute, tree:
> Exchange SinglePartition
>  Aggregate true, [], [COUNT(1) AS PartialCount#1390L]
>   HiveTableScan [], (MetastoreRelation default, gulfstream_day_driver_base_2, 
> None), 
> Some((HiveGenericUdf#org.apache.hadoop.hive.ql.udf.generic.GenericUDFConcat(year#1339,month#1340,day#1341)
>  = 20140929))
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
>   at org.apache.spark.sql.execution.Exchange.execute(Exchange.scala:44)
>   at 
> org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1.apply(Aggregate.scala:128)
>   at 
> org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1.apply(Aggregate.scala:127)
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46)
>   ... 16 more
> Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
> execute, tree:
> Aggregate true, [], [COUNT(1) AS PartialCount#1390L]
>  HiveTableScan [], (MetastoreRelation default, gulfstream_day_driver_base_2, 
> None), 
> Some((HiveGenericUdf#org.apache.hadoop.hive.ql.udf.generic.GenericUDFConcat(year#1339,month#1340,day#1341)
>  = 20140929))
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
>   at org.apache.spark.sql.execution.Aggregate.execute(Aggregate.scala:126)
>   at 
> org.apache.spark.sql.execution.Exchange$$anonfun$execute$1.apply(Exchange.scala:86)
>   at 
> org.apache.spark.sql.execution.Exchange$$anonfun$execute$1.apply(Exchange.scal

[jira] [Commented] (SPARK-4077) A broken string timestamp value can Spark SQL return wrong values for valid string timestamp values

2014-10-30 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191299#comment-14191299
 ] 

Venkata Ramana G commented on SPARK-4077:
-

I could not find this behaviour for non text source.
I case of runSqlHive it is serializing row by row, so this problem will not be 
visible.
org.apache.hadoop.hive.ql.exec.FetchTask.fetch

> A broken string timestamp value can Spark SQL return wrong values for valid 
> string timestamp values
> ---
>
> Key: SPARK-4077
> URL: https://issues.apache.org/jira/browse/SPARK-4077
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Yin Huai
>Assignee: Venkata Ramana G
>
> The following case returns wrong results.
> The text file is 
> {code}
> 2014-12-11 00:00:00,1
> 2014-12-11astring00:00:00,2
> {code}
> The DDL statement and the query are shown below...
> {code}
> sql("""
> create external table date_test(my_date timestamp, id int)
> row format delimited
> fields terminated by ','
> lines terminated by '\n'
> LOCATION 'dateTest'
> """)
> sql("select * from date_test").collect.foreach(println)
> {code}
> The result is 
> {code}
> [1969-12-31 19:00:00.0,1]
> [null,2]
> {code}
> If I change the data to 
> {code}
> 2014-12-11 00:00:00,1
> 2014-12-11 00:00:00,2
> {code}
> The result is fine.
> For the data with broken string timestamp value, I tried runSqlHive. The 
> result is fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4077) A broken string timestamp value can Spark SQL return wrong values for valid string timestamp values

2014-10-30 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190020#comment-14190020
 ] 

Venkata Ramana G edited comment on SPARK-4077 at 10/30/14 1:05 PM:
---

In org.apache.hadoop.hive.serde2.io.TimestampWritable.set , if the next entry 
is null then current time stamp object is being reset. 
Not sure why it is done like that in hive. We also can raise a bug in hive.

However because of this hiveinspectors:unwrap cannot use the same timestamp 
object without creating a copy. 


was (Author: gvramana):
In org.apache.hadoop.hive.serde2.io.TimestampWritable.init , if the next entry 
is null then current time stamp object is being reset. 
Not sure why it is done like that in hive. We also can raise a bug in hive.

However because of this hiveinspectors:unwrap cannot use the same timestamp 
object without creating a copy. 

> A broken string timestamp value can Spark SQL return wrong values for valid 
> string timestamp values
> ---
>
> Key: SPARK-4077
> URL: https://issues.apache.org/jira/browse/SPARK-4077
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Yin Huai
>Assignee: Venkata Ramana G
>
> The following case returns wrong results.
> The text file is 
> {code}
> 2014-12-11 00:00:00,1
> 2014-12-11astring00:00:00,2
> {code}
> The DDL statement and the query are shown below...
> {code}
> sql("""
> create external table date_test(my_date timestamp, id int)
> row format delimited
> fields terminated by ','
> lines terminated by '\n'
> LOCATION 'dateTest'
> """)
> sql("select * from date_test").collect.foreach(println)
> {code}
> The result is 
> {code}
> [1969-12-31 19:00:00.0,1]
> [null,2]
> {code}
> If I change the data to 
> {code}
> 2014-12-11 00:00:00,1
> 2014-12-11 00:00:00,2
> {code}
> The result is fine.
> For the data with broken string timestamp value, I tried runSqlHive. The 
> result is fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4077) A broken string timestamp value can Spark SQL return wrong values for valid string timestamp values

2014-10-30 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190020#comment-14190020
 ] 

Venkata Ramana G commented on SPARK-4077:
-

In org.apache.hadoop.hive.serde2.io.TimestampWritable.init , if the next entry 
is null then current time stamp object is being reset. 
Not sure why it is done like that in hive. We also can raise a bug in hive.

However because of this hiveinspectors:unwrap cannot use the same timestamp 
object without creating a copy. 

> A broken string timestamp value can Spark SQL return wrong values for valid 
> string timestamp values
> ---
>
> Key: SPARK-4077
> URL: https://issues.apache.org/jira/browse/SPARK-4077
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Yin Huai
>Assignee: Venkata Ramana G
>
> The following case returns wrong results.
> The text file is 
> {code}
> 2014-12-11 00:00:00,1
> 2014-12-11astring00:00:00,2
> {code}
> The DDL statement and the query are shown below...
> {code}
> sql("""
> create external table date_test(my_date timestamp, id int)
> row format delimited
> fields terminated by ','
> lines terminated by '\n'
> LOCATION 'dateTest'
> """)
> sql("select * from date_test").collect.foreach(println)
> {code}
> The result is 
> {code}
> [1969-12-31 19:00:00.0,1]
> [null,2]
> {code}
> If I change the data to 
> {code}
> 2014-12-11 00:00:00,1
> 2014-12-11 00:00:00,2
> {code}
> The result is fine.
> For the data with broken string timestamp value, I tried runSqlHive. The 
> result is fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3815) LPAD function does not work in where predicate

2014-10-22 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179813#comment-14179813
 ] 

Venkata Ramana G commented on SPARK-3815:
-

Still the issue is not re-producible, both limit and without limit is working 
fine for me.
As LIMIT triggers closure clean up, there must be some other reason for 
exception thrown during clean up.
I think exact data set and script is required to reproduce.
Check if this issue is any thing related to SPARK-3517
https://github.com/apache/spark/pull/2376
may be you can try that patch.

> LPAD function does not work in where predicate
> --
>
> Key: SPARK-3815
> URL: https://issues.apache.org/jira/browse/SPARK-3815
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Yana Kadiyska
>Priority: Minor
>
> select customer_id from mytable where 
> pkey=concat_ws('-',LPAD('077',4,'0'),'2014-07') LIMIT 2
> produces:
> 14/10/03 14:51:35 ERROR server.SparkSQLOperationManager: Error executing 
> query:
> org.apache.spark.SparkException: Task not serializable
> at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
> at 
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
> at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
> at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:597)
> at 
> org.apache.spark.sql.execution.Limit.execute(basicOperators.scala:146)
> at 
> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:360)
> at 
> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360)
> at 
> org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager$$anon$1.run(SparkSQLOperationManager.scala:185)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:193)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:175)
> at 
> org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:150)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:207)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58)
> at 
> org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526)
> at 
> org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.NotSerializableException: java.lang.reflect.Constructor
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> at java.io.ObjectOutputStream.writeObject(ObjectOut

[jira] [Commented] (SPARK-3815) LPAD function does not work in where predicate

2014-10-18 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14176226#comment-14176226
 ] 

Venkata Ramana G commented on SPARK-3815:
-

I found this working fine on the latest release. [~yanakad] Can you please 
reverify? thanks

> LPAD function does not work in where predicate
> --
>
> Key: SPARK-3815
> URL: https://issues.apache.org/jira/browse/SPARK-3815
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Yana Kadiyska
>Priority: Minor
>
> select customer_id from mytable where 
> pkey=concat_ws('-',LPAD('077',4,'0'),'2014-07') LIMIT 2
> produces:
> 14/10/03 14:51:35 ERROR server.SparkSQLOperationManager: Error executing 
> query:
> org.apache.spark.SparkException: Task not serializable
> at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
> at 
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
> at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
> at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:597)
> at 
> org.apache.spark.sql.execution.Limit.execute(basicOperators.scala:146)
> at 
> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:360)
> at 
> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360)
> at 
> org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager$$anon$1.run(SparkSQLOperationManager.scala:185)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:193)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:175)
> at 
> org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:150)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:207)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58)
> at 
> org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526)
> at 
> org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.NotSerializableException: java.lang.reflect.Constructor
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
> at scala.collection.immutable.$colon$colon.writeObject(List.scala:379)
> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> The following wo

[jira] [Commented] (SPARK-3891) Support Hive Percentile UDAF with array of percentile values

2014-10-14 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171019#comment-14171019
 ] 

Venkata Ramana G commented on SPARK-3891:
-

Following problems need to be fixed to passing array to percentile and 
percentile_approx UDAFs
1. percentile UDAF the parameters are not wrapped before passing to UDAF
2. percentile_approx takes only constant inspector as parameter, so constant 
inspectors support needs to be added to GenericUDAF.

> Support Hive Percentile UDAF with array of percentile values
> 
>
> Key: SPARK-3891
> URL: https://issues.apache.org/jira/browse/SPARK-3891
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
> Environment: Spark 1.2.0 trunk 
> (ac302052870a650d56f2d3131c27755bb2960ad7) on
> CDH 5.1.0
> Centos 6.5
> 8x 2GHz, 24GB RAM
>Reporter: Anand Mohan Tumuluri
>Assignee: Venkata Ramana G
>
> Spark PR 2620 brings in the support of Hive percentile UDAF.
> However Hive percentile and percentile_approx UDAFs also support returning an 
> array of percentile values with the syntax
> percentile(BIGINT col, array(p1 [, p2]...)) or 
> percentile_approx(DOUBLE col, array(p1 [, p2]...) [, B])
> These queries are failing with the below error:
> 0: jdbc:hive2://dev-uuppala.sfohi.philips.com> select name, 
> percentile(turnaroundtime,array(0,0.25,0.5,0.75,1)) from exam group by name;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 1 in stage 25.0 failed 4 times, most recent failure: Lost task 1.3 in 
> stage 25.0 (TID 305, Dev-uuppala.sfohi.philips.com): 
> java.lang.ClassCastException: scala.collection.mutable.ArrayBuffer cannot be 
> cast to [Ljava.lang.Object;
> 
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getListLength(StandardListObjectInspector.java:83)
> 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:259)
> 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ConversionHelper.convertIfNecessary(GenericUDFUtils.java:349)
> 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.iterate(GenericUDAFBridge.java:170)
> org.apache.spark.sql.hive.HiveUdafFunction.update(hiveUdfs.scala:342)
> 
> org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:167)
> 
> org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:151)
> org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
> org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> Driver stacktrace: (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2155) Support effectful / non-deterministic key expressions in CASE WHEN statements

2014-10-13 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169234#comment-14169234
 ] 

Venkata Ramana G commented on SPARK-2155:
-

we can separate CASE KEY WHEN  and CASE WHEN into two expressions and can have 
an common abstract base class containing common code.
This will address redundant evaluation problem without duplicating code.

> Support effectful / non-deterministic key expressions in CASE WHEN statements
> -
>
> Key: SPARK-2155
> URL: https://issues.apache.org/jira/browse/SPARK-2155
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Zongheng Yang
>Priority: Minor
>
> Currently we translate CASE KEY WHEN to CASE WHEN, hence incurring redundant 
> evaluations of the key expression. Relevant discussions here: 
> https://github.com/apache/spark/pull/1055/files#r13784248
> If we are very in need of support for effectful key expressions, at least we 
> can resort to the baseline approach of having both CaseWhen and CaseKeyWhen 
> as expressions, which seem to introduce much code duplication (e.g. see 
> https://github.com/concretevitamin/spark/blob/47d406a58d129e5bba68bfadf9dd1faa9054d834/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L216
>  for a sketch implementation). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3892) Map type should have typeName

2014-10-10 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166724#comment-14166724
 ] 

Venkata Ramana G commented on SPARK-3892:
-

Can you explain in detail?

> Map type should have typeName
> -
>
> Key: SPARK-3892
> URL: https://issues.apache.org/jira/browse/SPARK-3892
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Adrian Wang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-3879) spark-shell.cmd fails giving error "!=x was unexpected at this time"

2014-10-09 Thread Venkata Ramana G (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Ramana G closed SPARK-3879.
---
Resolution: Duplicate

> spark-shell.cmd fails giving error "!=x was unexpected at this time"
> 
>
> Key: SPARK-3879
> URL: https://issues.apache.org/jira/browse/SPARK-3879
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: Windows
>Reporter: Venkata Ramana G
>
> spark-shell.cmd giving error "!=x was unexpected at this time"
> This problem is introduced during SPARK-2058



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3879) spark-shell.cmd fails giving error "!=x was unexpected at this time"

2014-10-09 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165225#comment-14165225
 ] 

Venkata Ramana G commented on SPARK-3879:
-

It was already fixed, under SPARK-3808. So can close this issue.

> spark-shell.cmd fails giving error "!=x was unexpected at this time"
> 
>
> Key: SPARK-3879
> URL: https://issues.apache.org/jira/browse/SPARK-3879
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: Windows
>Reporter: Venkata Ramana G
>
> spark-shell.cmd giving error "!=x was unexpected at this time"
> This problem is introduced during SPARK-2058



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3879) spark-shell.cmd fails giving error "!=x was unexpected at this time"

2014-10-09 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165209#comment-14165209
 ] 

Venkata Ramana G commented on SPARK-3879:
-

I have fixed the same, about to submit PR.

> spark-shell.cmd fails giving error "!=x was unexpected at this time"
> 
>
> Key: SPARK-3879
> URL: https://issues.apache.org/jira/browse/SPARK-3879
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: Windows
>Reporter: Venkata Ramana G
>
> spark-shell.cmd giving error "!=x was unexpected at this time"
> This problem is introduced during SPARK-2058



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3879) spark-shell.cmd fails giving error "!=x was unexpected at this time"

2014-10-09 Thread Venkata Ramana G (JIRA)
Venkata Ramana G created SPARK-3879:
---

 Summary: spark-shell.cmd fails giving error "!=x was unexpected at 
this time"
 Key: SPARK-3879
 URL: https://issues.apache.org/jira/browse/SPARK-3879
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
 Environment: Windows
Reporter: Venkata Ramana G


spark-shell.cmd giving error "!=x was unexpected at this time"
This problem is introduced during SPARK-2058



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3845) SQLContext(...) should inherit configurations from SparkContext

2014-10-08 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163367#comment-14163367
 ] 

Venkata Ramana G commented on SPARK-3845:
-

As I understand that is the way, it works.
All the configuration options that start with "spark.sql" will be copied into 
SQLContext from SparkContext

> SQLContext(...) should inherit configurations from SparkContext
> ---
>
> Key: SPARK-3845
> URL: https://issues.apache.org/jira/browse/SPARK-3845
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Jianshi Huang
>
> It's very confusing that Spark configurations (e.g. spark.serializer, 
> spark.speculation, etc.) can be set in the spark-default.conf file, while 
> SparkSQL configurations (e..g spark.sql.inMemoryColumnarStorage.compressed, 
> spark.sql.codegen, etc.) has to be set either in sqlContext.setConf or 
> sql("SET ...").
> When I do:
>   val sqlContext = new org.apache.spark.sql.SQLContext(sparkContext)
> I would expect sqlContext recognizes all the SQL configurations comes with 
> sparkContext.
> Jianshi



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3559) appendReadColumnIDs and appendReadColumnNames introduce unnecessary columns in the lists of needed column ids and column names stored in hiveConf

2014-10-08 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163218#comment-14163218
 ] 

Venkata Ramana G commented on SPARK-3559:
-

As same hiveConf is used across queries columns get appended, and cannot be 
controlled to send only required columns.
HiveConf can be cloned at TableScanOperator and configure required properties.
deserializers are expecting this property to be set in HiveConf but not in 
table-specific properties.

> appendReadColumnIDs and appendReadColumnNames introduce unnecessary columns 
> in the lists of needed column ids and column names stored in hiveConf
> -
>
> Key: SPARK-3559
> URL: https://issues.apache.org/jira/browse/SPARK-3559
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>Priority: Blocker
>
> Because we are using the same hiveConf and we are currently using 
> ColumnProjectionUtils.appendReadColumnIDs 
> ColumnProjectionUtils.appendReadColumnNames to append needed column ids and 
> names for a table, lists of needed column ids and names can have unnecessary 
> columns.
> Also, for a join operation, TableScanOperators for both tables are sharing 
> the same hiveConf and they may need to set table-specific properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-3034) [HIve] java.sql.Date cannot be cast to java.sql.Timestamp

2014-10-05 Thread Venkata Ramana G (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Ramana G updated SPARK-3034:

Comment: was deleted

(was: Requires adding another data type DateType
Modifications required in parser, datatype addition and DataType conversion to 
and from TimeStamp and String. 
Compatibility with Date supported in Hive 0.12.0.
Date UDFs compatibility.

Started working on the same.)

> [HIve] java.sql.Date cannot be cast to java.sql.Timestamp
> -
>
> Key: SPARK-3034
> URL: https://issues.apache.org/jira/browse/SPARK-3034
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.0.2
>Reporter: pengyanhong
>Priority: Blocker
>
> run a simple HiveQL via yarn-cluster, got error as below:
> {quote}
> Exception in thread "Thread-2" java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:199)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0.0:127 failed 3 times, most recent failure: Exception failure in TID 
> 141 on host A01-R06-I147-41.jd.local: java.lang.ClassCastException: 
> java.sql.Date cannot be cast to java.sql.Timestamp
> 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaTimestampObjectInspector.getPrimitiveWritableObject(JavaTimestampObjectInspector.java:33)
> 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:251)
> 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486)
> 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439)
> 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423)
> 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$3$$anonfun$apply$1.apply(InsertIntoHiveTable.scala:200)
> 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$3$$anonfun$apply$1.apply(InsertIntoHiveTable.scala:192)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:149)
> 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158)
> 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158)
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
> org.apache.spark.scheduler.Task.run(Task.scala:51)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> java.lang.Thread.run(Thread.java:662)
> Driver stacktrace:
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   at 
> akka.dispatch.ForkJ

[jira] [Commented] (SPARK-3034) [HIve] java.sql.Date cannot be cast to java.sql.Timestamp

2014-10-05 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159998#comment-14159998
 ] 

Venkata Ramana G commented on SPARK-3034:
-

Requires adding another data type DateType
Modifications required in parser, datatype addition and DataType conversion to 
and from TimeStamp and String. 
Compatibility with Date supported in Hive 0.12.0.
Date UDFs compatibility.

Started working on the same.

> [HIve] java.sql.Date cannot be cast to java.sql.Timestamp
> -
>
> Key: SPARK-3034
> URL: https://issues.apache.org/jira/browse/SPARK-3034
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.0.2
>Reporter: pengyanhong
>Priority: Blocker
>
> run a simple HiveQL via yarn-cluster, got error as below:
> {quote}
> Exception in thread "Thread-2" java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:199)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0.0:127 failed 3 times, most recent failure: Exception failure in TID 
> 141 on host A01-R06-I147-41.jd.local: java.lang.ClassCastException: 
> java.sql.Date cannot be cast to java.sql.Timestamp
> 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaTimestampObjectInspector.getPrimitiveWritableObject(JavaTimestampObjectInspector.java:33)
> 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:251)
> 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:486)
> 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:439)
> 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:423)
> 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$3$$anonfun$apply$1.apply(InsertIntoHiveTable.scala:200)
> 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$3$$anonfun$apply$1.apply(InsertIntoHiveTable.scala:192)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:149)
> 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158)
> 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158)
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
> org.apache.spark.scheduler.Task.run(Task.scala:51)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> java.lang.Thread.run(Thread.java:662)
> Driver stacktrace:
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   

[jira] [Commented] (SPARK-3593) Support Sorting of Binary Type Data

2014-10-01 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154863#comment-14154863
 ] 

Venkata Ramana G commented on SPARK-3593:
-

BinaryType is currently not derived under NativeType and does not have Ordering 
Support.
So BinaryType can be moved under NativeType as it already has JvmType defined, 
and required to implement Ordering. 
Hive also identifies BinaryType these types under Primitive Types keeping other 
complex types like Arrays,Maps,Structs and union as Complex Types.
This is similar to current TimestampType handling.

> Support Sorting of Binary Type Data
> ---
>
> Key: SPARK-3593
> URL: https://issues.apache.org/jira/browse/SPARK-3593
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Paul Magid
>
> If you try sorting on a binary field you currently get an exception.   Please 
> add support for binary data type sorting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3268) DoubleType should support modulus

2014-09-18 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138763#comment-14138763
 ] 

Venkata Ramana G commented on SPARK-3268:
-

Problem:
double, decimal and float are identified under fractional type hierarchy
as per Scala, % (rem) function is implemented under integral type hierarchy.
Currently there is no mechanism exists to allow fractional type using integral 
operations.

Solution:
Scala provides overridden classes like DoubleAsIfIntegral, FloatAsIfIntegral, 
BigDecimalAsIfIntegral to allow these types to work with integral operators.
So we can add asIntegral to FractionalType  to support calling Integral related 
functions.
i2 function can call functions on asIntegral to execute the same.

Implemented the same, writing test cases is pending, will submit the patch for 
the same.

> DoubleType should support modulus 
> --
>
> Key: SPARK-3268
> URL: https://issues.apache.org/jira/browse/SPARK-3268
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Chris Grier
>Priority: Minor
>
> Using the modulus operator (%) on Doubles throws and exception. 
> eg: 
> SELECT 1388632775.0 % 60 from tablename LIMIT 1
> Throws: 
> java.lang.Exception: Type DoubleType does not support numeric operations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2189) Method for removing temp tables created by registerAsTable

2014-08-26 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110740#comment-14110740
 ] 

Venkata Ramana G commented on SPARK-2189:
-

unregisterTempTable("cachedTableName") this api should uncache the registered 
tables with InMemoryRelation.
I could not get any useful use case where cache still is required after 
unregisterTempTable.

If there is any valid usecase, then api can be modified to give more control to 
user, 
unregisterTempTable("cachedTableName", unCacheTables=true)  . this api by 
default should uncache the registered tables with InMemoryRelation.
However user can pass unCacheTables=false to change the behaviour.
Please comment.

> Method for removing temp tables created by registerAsTable
> --
>
> Key: SPARK-2189
> URL: https://issues.apache.org/jira/browse/SPARK-2189
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.0
>Reporter: Michael Armbrust
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2189) Method for removing temp tables created by registerAsTable

2014-08-24 Thread Venkata Ramana G (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108768#comment-14108768
 ] 

Venkata Ramana G commented on SPARK-2189:
-

Please assign this to me.

> Method for removing temp tables created by registerAsTable
> --
>
> Key: SPARK-2189
> URL: https://issues.apache.org/jira/browse/SPARK-2189
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.0
>Reporter: Michael Armbrust
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org