[jira] [Commented] (SPARK-12648) UDF with Option[Double] throws ClassCastException

2016-01-10 Thread Mikael Valot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091556#comment-15091556
 ] 

Mikael Valot commented on SPARK-12648:
--

Thanks everyone. [~viirya] This behaviour can be handy. However if I want to 
handle the None case in the UDF and replace it with something else as in my 
example above, I have to use a java.lang.Double and do a null check.



> UDF with Option[Double] throws ClassCastException
> -
>
> Key: SPARK-12648
> URL: https://issues.apache.org/jira/browse/SPARK-12648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Mikael Valot
>
> I can write an UDF that returns an Option[Double], and the DataFrame's  
> schema is correctly inferred to be a nullable double. 
> However I cannot seem to be able to write a UDF that takes an Option as an 
> argument:
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.{SparkContext, SparkConf}
> val conf = new SparkConf().setMaster("local[4]").setAppName("test")
> val sc = new SparkContext(conf)
> val sqlc = new SQLContext(sc)
> import sqlc.implicits._
> val df = sc.parallelize(List(("a", Some(4D)), ("b", None))).toDF("name", 
> "weight")
> import org.apache.spark.sql.functions._
> val addTwo = udf((d: Option[Double]) => d.map(_+2)) 
> df.withColumn("plusTwo", addTwo(df("weight"))).show()
> =>
> 2016-01-05T14:41:52 Executor task launch worker-0 ERROR 
> org.apache.spark.executor.Executor Exception in task 0.0 in stage 1.0 (TID 1)
> java.lang.ClassCastException: java.lang.Double cannot be cast to scala.Option
>   at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(:18) 
> ~[na:na]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source) ~[na:na]
>   at 
> org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
>  ~[spark-sql_2.10-1.6.0.jar:1.6.0]
>   at 
> org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
>  ~[spark-sql_2.10-1.6.0.jar:1.6.0]
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> ~[scala-library-2.10.5.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12747) Postgres JDBC ArrayType(DoubleType) 'Unable to find server array type'

2016-01-10 Thread Brandon Bradley (JIRA)
Brandon Bradley created SPARK-12747:
---

 Summary: Postgres JDBC ArrayType(DoubleType) 'Unable to find 
server array type'
 Key: SPARK-12747
 URL: https://issues.apache.org/jira/browse/SPARK-12747
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: Brandon Bradley


Hello,

I'm getting this exception when trying to use DataFrame.jdbc.write on a 
DataFrame with column ArrayType(DoubleType).

{noformat}
org.postgresql.util.PSQLException: Unable to find server array type for 
provided name double precision
{noformat}

Driver is definitely on the driver and executor classpath as I have other code 
that works without ArrayType. I'm not sure how to proceed in debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12744) Inconsistent behavior parsing JSON with unix timestamp values

2016-01-10 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-12744:
-
Labels: release_notes releasenotes  (was: )

> Inconsistent behavior parsing JSON with unix timestamp values
> -
>
> Key: SPARK-12744
> URL: https://issues.apache.org/jira/browse/SPARK-12744
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Anatoliy Plastinin
>Priority: Minor
>  Labels: release_notes, releasenotes
>
> Let’s have following json
> {code}
> val rdd = sc.parallelize("""{"ts":1452386229}""" :: Nil)
> {code}
> Spark sql casts int to timestamp treating int value as a number of seconds.
> https://issues.apache.org/jira/browse/SPARK-11724
> {code}
> scala> sqlContext.read.json(rdd).select($"ts".cast(TimestampType)).show
> ++
> |  ts|
> ++
> |2016-01-10 01:37:...|
> ++
> {code}
> However parsing json with schema gives different result
> {code}
> scala> val schema = (new StructType).add("ts", TimestampType)
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(ts,TimestampType,true))
> scala> sqlContext.read.schema(schema).json(rdd).show
> ++
> |  ts|
> ++
> |1970-01-17 20:26:...|
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12744) Inconsistent behavior parsing JSON with unix timestamp values

2016-01-10 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-12744:
-
Target Version/s: 2.0.0

> Inconsistent behavior parsing JSON with unix timestamp values
> -
>
> Key: SPARK-12744
> URL: https://issues.apache.org/jira/browse/SPARK-12744
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Anatoliy Plastinin
>Priority: Minor
>  Labels: release_notes, releasenotes
>
> Let’s have following json
> {code}
> val rdd = sc.parallelize("""{"ts":1452386229}""" :: Nil)
> {code}
> Spark sql casts int to timestamp treating int value as a number of seconds.
> https://issues.apache.org/jira/browse/SPARK-11724
> {code}
> scala> sqlContext.read.json(rdd).select($"ts".cast(TimestampType)).show
> ++
> |  ts|
> ++
> |2016-01-10 01:37:...|
> ++
> {code}
> However parsing json with schema gives different result
> {code}
> scala> val schema = (new StructType).add("ts", TimestampType)
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(ts,TimestampType,true))
> scala> sqlContext.read.schema(schema).json(rdd).show
> ++
> |  ts|
> ++
> |1970-01-17 20:26:...|
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10359) Enumerate Spark's dependencies in a file and diff against it for new pull requests

2016-01-10 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10359:
---
Fix Version/s: 1.5.3

> Enumerate Spark's dependencies in a file and diff against it for new pull 
> requests 
> ---
>
> Key: SPARK-10359
> URL: https://issues.apache.org/jira/browse/SPARK-10359
> Project: Spark
>  Issue Type: New Feature
>  Components: Build, Project Infra
>Reporter: Patrick Wendell
>Assignee: Josh Rosen
> Fix For: 1.5.3, 1.6.1, 2.0.0
>
>
> Sometimes when we have dependency changes it can be pretty unclear what 
> transitive set of things are changing. If we enumerate all of the 
> dependencies and put them in a source file in the repo, we can make it so 
> that it is very explicit what is changing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12646) Support _HOST in kerberos principal for connecting to secure cluster

2016-01-10 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091404#comment-15091404
 ] 

Marcelo Vanzin commented on SPARK-12646:


That sounds really weird. Why are you launching Spark jobs with YARN's 
credentials, instead of your own users? That sounds like a bad idea.

> Support _HOST in kerberos principal for connecting to secure cluster
> 
>
> Key: SPARK-12646
> URL: https://issues.apache.org/jira/browse/SPARK-12646
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Hari Krishna Dara
>Priority: Minor
>  Labels: security
>
> Hadoop supports _HOST as a token that is dynamically replaced with the actual 
> hostname at the time the kerberos authentication is done. This is supported 
> in many hadoop stacks including YARN. When configuring Spark to connect to 
> secure cluster (e.g., yarn-cluster or yarn-client as master), it would be 
> natural to extend support for this token to Spark as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12734) Fix Netty exclusions and use Maven Enforcer to prevent bug from being reintroduced

2016-01-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091425#comment-15091425
 ] 

Apache Spark commented on SPARK-12734:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/10691

> Fix Netty exclusions and use Maven Enforcer to prevent bug from being 
> reintroduced
> --
>
> Key: SPARK-12734
> URL: https://issues.apache.org/jira/browse/SPARK-12734
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Project Infra
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 2.0.0
>
>
> Netty classes are published under artifacts with different names, so our 
> build needs to exclude the {{io.netty:netty}} and {{org.jboss.netty:netty}} 
> versions of the Netty artifact. However, our existing exclusions were 
> incomplete, leading to situations where duplicate Netty classes would wind up 
> on the classpath and cause compile errors (or worse).
> We should fix this and should also start using Maven Enforcer's dependency 
> banning mechanisms to prevent this problem from ever being reintroduced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10898) Setting spark.streaming.concurrentJobs causes blocks to be deleted before read

2016-01-10 Thread Praveen Devarao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091478#comment-15091478
 ] 

Praveen Devarao commented on SPARK-10898:
-

Hi [~mark.goodall]

Is this still a valid issue, reproducible on SPARK 1.6? If yes, could you 
provide more info on your configuration and steps to repro. I would like to 
take a shot at this and help resolve.

Thanks

Praveen

> Setting spark.streaming.concurrentJobs causes blocks to be deleted before read
> --
>
> Key: SPARK-10898
> URL: https://issues.apache.org/jira/browse/SPARK-10898
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.4.1
> Environment: CentOS 6.6
>Reporter: Mark Goodall
>
> The scheduler deletes the block literally just before it is used first time. 
> The input is set to mem and disk ser.
> 15/10/01 15:10:04 INFO scheduler.InputInfoTracker: remove old batch metadata: 
> 1443708599000 ms 1443708602000 ms 1443708601000 ms 144370860 ms
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708601800 on discos8.localdomain:45076 in memory (size: 8.7 MB, 
> free: 3.0 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708602000 on discos8.localdomain:45076 in memory (size: 8.7 MB, 
> free: 3.0 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708602200 on discos8.localdomain:45076 in memory (size: 7.3 MB, 
> free: 3.0 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708602400 on discos8.localdomain:45076 in memory (size: 5.7 MB, 
> free: 3.0 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708602600 on discos8.localdomain:45076 in memory (size: 2.6 MB, 
> free: 3.0 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708599800 on discos8.localdomain:45076 in memory (size: 5.8 MB, 
> free: 3.0 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-144370860 on discos8.localdomain:45076 in memory (size: 6.4 MB, 
> free: 3.0 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708600200 on discos8.localdomain:45076 in memory (size: 7.0 MB, 
> free: 3.0 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708600400 on discos8.localdomain:45076 in memory (size: 6.9 MB, 
> free: 3.0 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708600600 on discos8.localdomain:45076 in memory (size: 3.8 MB, 
> free: 3.0 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708600800 on discos8.localdomain:45076 in memory (size: 4.2 MB, 
> free: 3.0 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708601000 on discos8.localdomain:45076 in memory (size: 4.7 MB, 
> free: 3.0 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708601200 on discos8.localdomain:45076 in memory (size: 5.4 MB, 
> free: 3.0 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708601400 on discos8.localdomain:45076 in memory (size: 5.5 MB, 
> free: 3.0 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708601600 on discos8.localdomain:45076 in memory (size: 8.9 MB, 
> free: 3.1 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708598800 on discos8.localdomain:45076 in memory (size: 8.1 MB, 
> free: 3.1 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708599000 on discos8.localdomain:45076 in memory (size: 7.8 MB, 
> free: 3.1 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708599200 on discos8.localdomain:45076 in memory (size: 5.9 MB, 
> free: 3.1 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708599400 on discos8.localdomain:45076 in memory (size: 6.0 MB, 
> free: 3.1 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Removed 
> input-0-1443708599600 on discos8.localdomain:45076 in memory (size: 6.6 MB, 
> free: 3.1 GB)
> 15/10/01 15:10:04 INFO storage.BlockManagerInfo: Added input-0-1443708604600 
> in memory on discos8.localdomain:45076 (size: 8.7 MB, free: 3.1 GB)
> 15/10/01 15:10:04 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 6.0 
> (TID 84, discos1.localdomain): java.lang.Exception: Could not compute split, 
> block input-0-1443708599800 not found
>   at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>   at 

[jira] [Commented] (SPARK-12646) Support _HOST in kerberos principal for connecting to secure cluster

2016-01-10 Thread Hari Krishna Dara (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091395#comment-15091395
 ] 

Hari Krishna Dara commented on SPARK-12646:
---

Marcelo, I need this for the same reason that hadoop needs (which you already 
mentioned). Basically, my Spark jobs could be triggered from any of the many 
nodemanagers in the cluster and so if I have the Spark principal configured the 
same way as for hadoop (with _HOST placeholder), then configuring it becomes 
easier just like for hadoop. In my case, I am also sharing the same principal 
between YARN and Spark, so it makes even more sense.

> Support _HOST in kerberos principal for connecting to secure cluster
> 
>
> Key: SPARK-12646
> URL: https://issues.apache.org/jira/browse/SPARK-12646
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Hari Krishna Dara
>Priority: Minor
>  Labels: security
>
> Hadoop supports _HOST as a token that is dynamically replaced with the actual 
> hostname at the time the kerberos authentication is done. This is supported 
> in many hadoop stacks including YARN. When configuring Spark to connect to 
> secure cluster (e.g., yarn-cluster or yarn-client as master), it would be 
> natural to extend support for this token to Spark as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12734) Fix Netty exclusions and use Maven Enforcer to prevent bug from being reintroduced

2016-01-10 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-12734.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10672
[https://github.com/apache/spark/pull/10672]

> Fix Netty exclusions and use Maven Enforcer to prevent bug from being 
> reintroduced
> --
>
> Key: SPARK-12734
> URL: https://issues.apache.org/jira/browse/SPARK-12734
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Project Infra
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 2.0.0
>
>
> Netty classes are published under artifacts with different names, so our 
> build needs to exclude the {{io.netty:netty}} and {{org.jboss.netty:netty}} 
> versions of the Netty artifact. However, our existing exclusions were 
> incomplete, leading to situations where duplicate Netty classes would wind up 
> on the classpath and cause compile errors (or worse).
> We should fix this and should also start using Maven Enforcer's dependency 
> banning mechanisms to prevent this problem from ever being reintroduced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3873) Scala style: check import ordering

2016-01-10 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-3873.

   Resolution: Fixed
Fix Version/s: 2.0.0

> Scala style: check import ordering
> --
>
> Key: SPARK-3873
> URL: https://issues.apache.org/jira/browse/SPARK-3873
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Reporter: Reynold Xin
>Assignee: Marcelo Vanzin
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-10 Thread Earthson Lu (JIRA)
Earthson Lu created SPARK-12746:
---

 Summary: ArrayType(_, true) should also accept ArrayType(_, false)
 Key: SPARK-12746
 URL: https://issues.apache.org/jira/browse/SPARK-12746
 Project: Spark
  Issue Type: Bug
  Components: ML, SQL
Affects Versions: 1.6.0
Reporter: Earthson Lu


I see CountVectorizer has schema check for ArrayType which has 
ArrayType(StringType, true). 

ArrayType(String, false) is just a special case of ArrayType(String, false), 
but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12734) Fix Netty exclusions and use Maven Enforcer to prevent bug from being reintroduced

2016-01-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091420#comment-15091420
 ] 

Apache Spark commented on SPARK-12734:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/10690

> Fix Netty exclusions and use Maven Enforcer to prevent bug from being 
> reintroduced
> --
>
> Key: SPARK-12734
> URL: https://issues.apache.org/jira/browse/SPARK-12734
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Project Infra
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 2.0.0
>
>
> Netty classes are published under artifacts with different names, so our 
> build needs to exclude the {{io.netty:netty}} and {{org.jboss.netty:netty}} 
> versions of the Netty artifact. However, our existing exclusions were 
> incomplete, leading to situations where duplicate Netty classes would wind up 
> on the classpath and cause compile errors (or worse).
> We should fix this and should also start using Maven Enforcer's dependency 
> banning mechanisms to prevent this problem from ever being reintroduced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12740) grouping()/grouping_id() should work with having and order by

2016-01-10 Thread Liang-Chi Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091418#comment-15091418
 ] 

Liang-Chi Hsieh commented on SPARK-12740:
-

[~davies] Do we have the functions grouping and grouping_id? I guess that they 
are not GROUPING__ID, right?

> grouping()/grouping_id() should work with having and order by
> -
>
> Key: SPARK-12740
> URL: https://issues.apache.org/jira/browse/SPARK-12740
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>
> The following query should work
> {code}
> select a, b, sum(c) from t group by cube(a, b) having grouping(a) = 0 order 
> by grouping_id(a, b)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12646) Support _HOST in kerberos principal for connecting to secure cluster

2016-01-10 Thread Hari Krishna Dara (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091406#comment-15091406
 ] 

Hari Krishna Dara commented on SPARK-12646:
---

In this environment, users don't have direct shell access and the Spark access 
is going to be controlled via another frontend that uses different credentials. 
This is how it is setup and I can't change it :(

> Support _HOST in kerberos principal for connecting to secure cluster
> 
>
> Key: SPARK-12646
> URL: https://issues.apache.org/jira/browse/SPARK-12646
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Hari Krishna Dara
>Priority: Minor
>  Labels: security
>
> Hadoop supports _HOST as a token that is dynamically replaced with the actual 
> hostname at the time the kerberos authentication is done. This is supported 
> in many hadoop stacks including YARN. When configuring Spark to connect to 
> secure cluster (e.g., yarn-cluster or yarn-client as master), it would be 
> natural to extend support for this token to Spark as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12652) Upgrade py4j to the incoming version 0.9.1

2016-01-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12652:


Assignee: (was: Apache Spark)

> Upgrade py4j to the incoming version 0.9.1
> --
>
> Key: SPARK-12652
> URL: https://issues.apache.org/jira/browse/SPARK-12652
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Shixiong Zhu
>
> Upgrade py4j when py4j 0.9.1 is out. Mostly because it fixes two critical 
> issues: SPARK-12511 and SPARK-12617



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12652) Upgrade py4j to the incoming version 0.9.1

2016-01-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091483#comment-15091483
 ] 

Apache Spark commented on SPARK-12652:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/10692

> Upgrade py4j to the incoming version 0.9.1
> --
>
> Key: SPARK-12652
> URL: https://issues.apache.org/jira/browse/SPARK-12652
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Shixiong Zhu
>
> Upgrade py4j when py4j 0.9.1 is out. Mostly because it fixes two critical 
> issues: SPARK-12511 and SPARK-12617



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12648) UDF with Option[Double] throws ClassCastException

2016-01-10 Thread kevin yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091484#comment-15091484
 ] 

kevin yu commented on SPARK-12648:
--

Hello Jakob & Liang-Chi: Thanks for the help. Kevin

> UDF with Option[Double] throws ClassCastException
> -
>
> Key: SPARK-12648
> URL: https://issues.apache.org/jira/browse/SPARK-12648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Mikael Valot
>
> I can write an UDF that returns an Option[Double], and the DataFrame's  
> schema is correctly inferred to be a nullable double. 
> However I cannot seem to be able to write a UDF that takes an Option as an 
> argument:
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.{SparkContext, SparkConf}
> val conf = new SparkConf().setMaster("local[4]").setAppName("test")
> val sc = new SparkContext(conf)
> val sqlc = new SQLContext(sc)
> import sqlc.implicits._
> val df = sc.parallelize(List(("a", Some(4D)), ("b", None))).toDF("name", 
> "weight")
> import org.apache.spark.sql.functions._
> val addTwo = udf((d: Option[Double]) => d.map(_+2)) 
> df.withColumn("plusTwo", addTwo(df("weight"))).show()
> =>
> 2016-01-05T14:41:52 Executor task launch worker-0 ERROR 
> org.apache.spark.executor.Executor Exception in task 0.0 in stage 1.0 (TID 1)
> java.lang.ClassCastException: java.lang.Double cannot be cast to scala.Option
>   at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(:18) 
> ~[na:na]
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source) ~[na:na]
>   at 
> org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
>  ~[spark-sql_2.10-1.6.0.jar:1.6.0]
>   at 
> org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
>  ~[spark-sql_2.10-1.6.0.jar:1.6.0]
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> ~[scala-library-2.10.5.jar:na]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12652) Upgrade py4j to the incoming version 0.9.1

2016-01-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12652:


Assignee: Apache Spark

> Upgrade py4j to the incoming version 0.9.1
> --
>
> Key: SPARK-12652
> URL: https://issues.apache.org/jira/browse/SPARK-12652
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Shixiong Zhu
>Assignee: Apache Spark
>
> Upgrade py4j when py4j 0.9.1 is out. Mostly because it fixes two critical 
> issues: SPARK-12511 and SPARK-12617



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-10 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091487#comment-15091487
 ] 

Earthson Lu commented on SPARK-12746:
-

I could work on this:)

I have some idea:

1. we could implement a more powerful type check api
2. check manually for all the case

I will choose the latter

> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, false), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-10 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091487#comment-15091487
 ] 

Earthson Lu edited comment on SPARK-12746 at 1/11/16 6:11 AM:
--

I could work on this:)

I have some idea:

1. we could implement a more powerful type check api
2. check manually for all the cases

I will choose the latter


was (Author: earthsonlu):
I could work on this:)

I have some idea:

1. we could implement a more powerful type check api
2. check manually for all the case

I will choose the latter

> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, false), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12748) Failed to create HiveContext in SparkSql

2016-01-10 Thread Ujjal Satpathy (JIRA)
Ujjal Satpathy created SPARK-12748:
--

 Summary: Failed to create HiveContext in SparkSql
 Key: SPARK-12748
 URL: https://issues.apache.org/jira/browse/SPARK-12748
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.6.0
Reporter: Ujjal Satpathy
Priority: Critical


Hi,
I am trying to create HiveContext using Java API in Spark Sql (ver 1.6.0).

HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(sc);

But its not creating any hivecontext and throwing below exception:
java.sql.SQLException: Failed to start database 'metastore_db' with class 
loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@17a1ba8d.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12734) Fix Netty exclusions and use Maven Enforcer to prevent bug from being reintroduced

2016-01-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091538#comment-15091538
 ] 

Apache Spark commented on SPARK-12734:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/10693

> Fix Netty exclusions and use Maven Enforcer to prevent bug from being 
> reintroduced
> --
>
> Key: SPARK-12734
> URL: https://issues.apache.org/jira/browse/SPARK-12734
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Project Infra
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 2.0.0
>
>
> Netty classes are published under artifacts with different names, so our 
> build needs to exclude the {{io.netty:netty}} and {{org.jboss.netty:netty}} 
> versions of the Netty artifact. However, our existing exclusions were 
> incomplete, leading to situations where duplicate Netty classes would wind up 
> on the classpath and cause compile errors (or worse).
> We should fix this and should also start using Maven Enforcer's dependency 
> banning mechanisms to prevent this problem from ever being reintroduced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12736) Standalone Master cannot be started due to NoClassDefFoundError: org/spark-project/guava/collect/Maps

2016-01-10 Thread Jacek Laskowski (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091039#comment-15091039
 ] 

Jacek Laskowski commented on SPARK-12736:
-

Good point! I didn't think about it. Thanks.

> Standalone Master cannot be started due to NoClassDefFoundError: 
> org/spark-project/guava/collect/Maps
> -
>
> Key: SPARK-12736
> URL: https://issues.apache.org/jira/browse/SPARK-12736
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Assignee: Jacek Laskowski
> Fix For: 2.0.0
>
>
> After 
> https://github.com/apache/spark/commit/659fd9d04b988d48960eac4f352ca37066f43f5c
>  starting standalone Master (using {{./sbin/start-master.sh}}) fails with the 
> following exception:
> {code}
> Spark Command: 
> /Library/Java/JavaVirtualMachines/Current/Contents/Home/bin/java
> -cp 
> /Users/jacek/dev/oss/spark/conf/:/Users/jacek/dev/oss/spark/assembly/target/scala-2.11/spark-assembly-2.0.0-SNAPSHOT-hadoop2.7.1.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-core-3.2.10.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-rdbms-3.2.9.jar
> -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip japila.local
> --port 7077 --webui-port 8080
> 
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel).
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/spark-project/guava/collect/Maps
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.(MetricsRegistry.java:42)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.(MetricsSystemImpl.java:94)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.(MetricsSystemImpl.java:141)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:38)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:36)
> at 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:120)
> at 
> org.apache.hadoop.security.UserGroupInformation.(UserGroupInformation.java:236)
> at 
> org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2156)
> at 
> org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2156)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2156)
> at org.apache.spark.SecurityManager.(SecurityManager.scala:214)
> at 
> org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1108)
> at org.apache.spark.deploy.master.Master$.main(Master.scala:1093)
> at org.apache.spark.deploy.master.Master.main(Master.scala)
> Caused by: java.lang.ClassNotFoundException:
> org.spark-project.guava.collect.Maps
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 15 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12741) DataFrame count method return wrong size.

2016-01-10 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-12741.
---
Resolution: Cannot Reproduce

[~sasi2103] this isn't a useful report, since you included no info about how to 
reproduce it. I'm not able to reproduce any problems with dataframe counts.

Please read 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before 
opening a JIRA and consider whether you've provided an actionable report firs.t

> DataFrame count method return wrong size.
> -
>
> Key: SPARK-12741
> URL: https://issues.apache.org/jira/browse/SPARK-12741
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Sasi
>
> Hi,
> I noted that DataFrame count method always return wrong size.
> Assume I have 11 records.
> When running dataframe.count() I get 9.
> Also if I'm running dataframe.collectAsList() then i'll get 9 records instead 
> of 11.
> But if I run dataframe.collect() then i'll get 11.
> Thanks,
> Sasi



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12691) Multiple unionAll on Dataframe seems to cause repeated calculations in a "Fibonacci" manner

2016-01-10 Thread Allen Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090908#comment-15090908
 ] 

Allen Liang edited comment on SPARK-12691 at 1/10/16 10:30 AM:
---

Hi Bo Meng,

I understand your point, but I don't think this has anything to do with size of 
dataframe. Or how do you explain this behavior doesn't happen when we do the 
same thing to RDDs. If you union all the RDDs in dataframes in above sample 
code, you'll find each round of RDD union takes a relatively constant time (NOT 
growing at all), which is expected.

The code attached is a simple sample to reproduce this issue and the time 
costed may not seem to be terrible here. However, in our real case, where we we 
have 202 dataframes (which all happen to be empty dataframe (meaning size is 
zero)) to unionAll, and it took around over 20 minutes to complete, which 
obviously is not acceptable.

To workaround this issue we actually directly unioned all the RDDs in those 202 
dataframes and convert back the final RDD to dataframe in the end. And that 
whole workaround took around only around 10s seconds to complete. Compared to 
20+ minutes when we unionAll 202 empty dataframes, this already is a huge 
improvement.

I think there has to be something wrong in the multiple dataframe unionAll or 
let's say there has to be something we can improve here.


was (Author: lliang):
Hi Bo Meng,

I understand your point, but I don't think this has anything to do with size of 
dataframe. Or how do you explain this behavior doesn't happen when we do the 
same thing to RDDs. If you union all the RDDs in dataframes in above sample 
code, you'll find each round of RDD union takes a relatively constant time (NOT 
growing at all), which is expected.

The code attached is a simple sample to reproduce this issue and the time 
costed may not seem to be terrible here. However, in our real case, where we we 
have 202 dataframes (which all happen to be empty dataframe (meaning size is 
zero)) to unionAll, and it took around over 20 minutes to complete, which 
obviously is not acceptable.

To workaround this issue we actually directly unioned all the RDDs in those 202 
dataframes and convert back the final RDD to dataframe in the end. And that 
whole workaround took around only 20+ seconds to complete. Compared to 20+ 
minutes when we unionAll 202 empty dataframes, this already is a huge 
improvement.

I think there has to be something wrong in the multiple dataframe unionAll or 
let's say there has to be something we can improve here.

> Multiple unionAll on Dataframe seems to cause repeated calculations in a 
> "Fibonacci" manner
> ---
>
> Key: SPARK-12691
> URL: https://issues.apache.org/jira/browse/SPARK-12691
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1
> Environment: Tested in Spark 1.3 and 1.4.
>Reporter: Allen Liang
>
> Multiple unionAll on Dataframe seems to cause repeated calculations. Here is 
> the sample code to reproduce this issue.
> val dfs = for (i<-0 to 100) yield {
>   val df = sc.parallelize((0 to 10).zipWithIndex).toDF("A", "B")
>   df
> }
> var i = 1
> val s1 = System.currentTimeMillis()
> dfs.reduce{(a,b)=>{
>   val t1 = System.currentTimeMillis()
>   val dd = a unionAll b
>   val t2 = System.currentTimeMillis()
>   println("Round " + i + " unionAll took " + (t2 - t1) + " ms")
>   i = i + 1
>   dd
>   }
> }
> val s2 = System.currentTimeMillis()
> println((i - 1) + " unionAll took totally " + (s2 - s1) + " ms")
> And it printed as follows. And as you can see, it looks like each unionAll 
> seems to redo all the previous unionAll and therefore took self time plus all 
> previous time, which, not precisely speaking, makes each unionAll look like a 
> "Fibonacci" action.
> BTW, this behaviour doesn't happen if I directly union all the RDDs in 
> Dataframes.
> - output start 
> Round 1 unionAll took 1 ms
> Round 2 unionAll took 1 ms
> Round 3 unionAll took 1 ms
> Round 4 unionAll took 1 ms
> Round 5 unionAll took 1 ms
> Round 6 unionAll took 1 ms
> Round 7 unionAll took 1 ms
> Round 8 unionAll took 2 ms
> Round 9 unionAll took 2 ms
> Round 10 unionAll took 2 ms
> Round 11 unionAll took 3 ms
> Round 12 unionAll took 3 ms
> Round 13 unionAll took 3 ms
> Round 14 unionAll took 3 ms
> Round 15 unionAll took 3 ms
> Round 16 unionAll took 4 ms
> Round 17 unionAll took 4 ms
> Round 18 unionAll took 4 ms
> Round 19 unionAll took 4 ms
> Round 20 unionAll took 4 ms
> Round 21 unionAll took 5 ms
> Round 22 unionAll took 5 ms
> Round 23 unionAll took 5 ms
> Round 24 unionAll took 5 ms
> Round 25 unionAll took 5 ms
> Round 26 unionAll took 6 ms
> Round 27 unionAll took 6 ms
> Round 28 unionAll took 6 ms
> 

[jira] [Updated] (SPARK-12736) Standalone Master cannot be started due to NoClassDefFoundError: org/spark-project/guava/collect/Maps

2016-01-10 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12736:
--
Assignee: Jacek Laskowski

> Standalone Master cannot be started due to NoClassDefFoundError: 
> org/spark-project/guava/collect/Maps
> -
>
> Key: SPARK-12736
> URL: https://issues.apache.org/jira/browse/SPARK-12736
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Assignee: Jacek Laskowski
> Fix For: 2.0.0
>
>
> After 
> https://github.com/apache/spark/commit/659fd9d04b988d48960eac4f352ca37066f43f5c
>  starting standalone Master (using {{./sbin/start-master.sh}}) fails with the 
> following exception:
> {code}
> Spark Command: 
> /Library/Java/JavaVirtualMachines/Current/Contents/Home/bin/java
> -cp 
> /Users/jacek/dev/oss/spark/conf/:/Users/jacek/dev/oss/spark/assembly/target/scala-2.11/spark-assembly-2.0.0-SNAPSHOT-hadoop2.7.1.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-core-3.2.10.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-rdbms-3.2.9.jar
> -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip japila.local
> --port 7077 --webui-port 8080
> 
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel).
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/spark-project/guava/collect/Maps
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.(MetricsRegistry.java:42)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.(MetricsSystemImpl.java:94)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.(MetricsSystemImpl.java:141)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:38)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:36)
> at 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:120)
> at 
> org.apache.hadoop.security.UserGroupInformation.(UserGroupInformation.java:236)
> at 
> org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2156)
> at 
> org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2156)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2156)
> at org.apache.spark.SecurityManager.(SecurityManager.scala:214)
> at 
> org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1108)
> at org.apache.spark.deploy.master.Master$.main(Master.scala:1093)
> at org.apache.spark.deploy.master.Master.main(Master.scala)
> Caused by: java.lang.ClassNotFoundException:
> org.spark-project.guava.collect.Maps
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 15 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12736) Standalone Master cannot be started due to NoClassDefFoundError: org/spark-project/guava/collect/Maps

2016-01-10 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090975#comment-15090975
 ] 

Sean Owen commented on SPARK-12736:
---

Rather than open a new JIRA, you should open a PR against the old one, if it's 
clearly a fix/add-on to the other change. Otherwise there's not a strong 
connection between the two changes, but logically they must go together.

> Standalone Master cannot be started due to NoClassDefFoundError: 
> org/spark-project/guava/collect/Maps
> -
>
> Key: SPARK-12736
> URL: https://issues.apache.org/jira/browse/SPARK-12736
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
> Fix For: 2.0.0
>
>
> After 
> https://github.com/apache/spark/commit/659fd9d04b988d48960eac4f352ca37066f43f5c
>  starting standalone Master (using {{./sbin/start-master.sh}}) fails with the 
> following exception:
> {code}
> Spark Command: 
> /Library/Java/JavaVirtualMachines/Current/Contents/Home/bin/java
> -cp 
> /Users/jacek/dev/oss/spark/conf/:/Users/jacek/dev/oss/spark/assembly/target/scala-2.11/spark-assembly-2.0.0-SNAPSHOT-hadoop2.7.1.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-core-3.2.10.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-rdbms-3.2.9.jar
> -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip japila.local
> --port 7077 --webui-port 8080
> 
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel).
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/spark-project/guava/collect/Maps
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.(MetricsRegistry.java:42)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.(MetricsSystemImpl.java:94)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.(MetricsSystemImpl.java:141)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:38)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:36)
> at 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:120)
> at 
> org.apache.hadoop.security.UserGroupInformation.(UserGroupInformation.java:236)
> at 
> org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2156)
> at 
> org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2156)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2156)
> at org.apache.spark.SecurityManager.(SecurityManager.scala:214)
> at 
> org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1108)
> at org.apache.spark.deploy.master.Master$.main(Master.scala:1093)
> at org.apache.spark.deploy.master.Master.main(Master.scala)
> Caused by: java.lang.ClassNotFoundException:
> org.spark-project.guava.collect.Maps
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 15 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12741) DataFrame count method return wrong size.

2016-01-10 Thread Sasi (JIRA)
Sasi created SPARK-12741:


 Summary: DataFrame count method return wrong size.
 Key: SPARK-12741
 URL: https://issues.apache.org/jira/browse/SPARK-12741
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: Sasi


Hi,
I noted that DataFrame count method always return wrong size.
Assume I have 11 records.
When running dataframe.count() I get 9.
Also if I'm running dataframe.collectAsList() then i'll get 9 records instead 
of 11.
But if I run dataframe.collect() then i'll get 11.

Thanks,
Sasi



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12736) Standalone Master cannot be started due to NoClassDefFoundError: org/spark-project/guava/collect/Maps

2016-01-10 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-12736.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10674
[https://github.com/apache/spark/pull/10674]

> Standalone Master cannot be started due to NoClassDefFoundError: 
> org/spark-project/guava/collect/Maps
> -
>
> Key: SPARK-12736
> URL: https://issues.apache.org/jira/browse/SPARK-12736
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
> Fix For: 2.0.0
>
>
> After 
> https://github.com/apache/spark/commit/659fd9d04b988d48960eac4f352ca37066f43f5c
>  starting standalone Master (using {{./sbin/start-master.sh}}) fails with the 
> following exception:
> {code}
> Spark Command: 
> /Library/Java/JavaVirtualMachines/Current/Contents/Home/bin/java
> -cp 
> /Users/jacek/dev/oss/spark/conf/:/Users/jacek/dev/oss/spark/assembly/target/scala-2.11/spark-assembly-2.0.0-SNAPSHOT-hadoop2.7.1.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-core-3.2.10.jar:/Users/jacek/dev/oss/spark/lib_managed/jars/datanucleus-rdbms-3.2.9.jar
> -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip japila.local
> --port 7077 --webui-port 8080
> 
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel).
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/spark-project/guava/collect/Maps
> at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.(MetricsRegistry.java:42)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.(MetricsSystemImpl.java:94)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.(MetricsSystemImpl.java:141)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:38)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:36)
> at 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:120)
> at 
> org.apache.hadoop.security.UserGroupInformation.(UserGroupInformation.java:236)
> at 
> org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2156)
> at 
> org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2156)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2156)
> at org.apache.spark.SecurityManager.(SecurityManager.scala:214)
> at 
> org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1108)
> at org.apache.spark.deploy.master.Master$.main(Master.scala:1093)
> at org.apache.spark.deploy.master.Master.main(Master.scala)
> Caused by: java.lang.ClassNotFoundException:
> org.spark-project.guava.collect.Maps
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 15 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-10 Thread Nikita Tarasenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091068#comment-15091068
 ] 

Nikita Tarasenko commented on SPARK-12177:
--

I created a new PR which is based on the master branch - 
https://github.com/apache/spark/pull/10681

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12742) org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to

2016-01-10 Thread Fei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Wang updated SPARK-12742:
-
Summary: org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to   
(was: org.apache.spark.sql.hive.LogicalPlanToSQLSuite failuer)

> org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to 
> ---
>
> Key: SPARK-12742
> URL: https://issues.apache.org/jira/browse/SPARK-12742
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Fei Wang
>
> [info] Exception encountered when attempting to run a suite with class name: 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite *** ABORTED *** (325 
> milliseconds)
> [info]   org.apache.spark.sql.AnalysisException: Table `t1` already exists.;
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:296)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:285)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:33)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:23)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.run(LogicalPlanToSQLSuite.scala:23)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
> [info]   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
> [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
> [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
> [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [info]   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12742) org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already exists

2016-01-10 Thread Fei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Wang updated SPARK-12742:
-
   Due Date: 11/Jan/16
Component/s: SQL

> org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already 
> exists
> ---
>
> Key: SPARK-12742
> URL: https://issues.apache.org/jira/browse/SPARK-12742
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Fei Wang
>
> [info] Exception encountered when attempting to run a suite with class name: 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite *** ABORTED *** (325 
> milliseconds)
> [info]   org.apache.spark.sql.AnalysisException: Table `t1` already exists.;
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:296)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:285)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:33)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:23)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.run(LogicalPlanToSQLSuite.scala:23)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
> [info]   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
> [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
> [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
> [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [info]   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12742) org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already exists

2016-01-10 Thread Fei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Wang updated SPARK-12742:
-
Summary: org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to 
Table already exists  (was: org.apache.spark.sql.hive.LogicalPlanToSQLSuite 
failure due to )

> org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already 
> exists
> ---
>
> Key: SPARK-12742
> URL: https://issues.apache.org/jira/browse/SPARK-12742
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Fei Wang
>
> [info] Exception encountered when attempting to run a suite with class name: 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite *** ABORTED *** (325 
> milliseconds)
> [info]   org.apache.spark.sql.AnalysisException: Table `t1` already exists.;
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:296)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:285)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:33)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:23)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.run(LogicalPlanToSQLSuite.scala:23)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
> [info]   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
> [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
> [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
> [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [info]   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12692) Scala style: check no white space before comma and colon

2016-01-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091137#comment-15091137
 ] 

Apache Spark commented on SPARK-12692:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/10683

> Scala style: check no white space before comma and colon
> 
>
> Key: SPARK-12692
> URL: https://issues.apache.org/jira/browse/SPARK-12692
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>
> We should not put a white space before `,` and `:` so let's check it.
> Because there are lots of style violation, first, I'd like to add a checker, 
> enable  and let the level `warn`.
> Then, I'd like to fix the style step by step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12740) grouping()/grouping_id() should work with having and order by

2016-01-10 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-12740:

Component/s: SQL

> grouping()/grouping_id() should work with having and order by
> -
>
> Key: SPARK-12740
> URL: https://issues.apache.org/jira/browse/SPARK-12740
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>
> The following query should work
> {code}
> select a, b, sum(c) from t group by cube(a, b) having grouping(a) = 0 order 
> by grouping_id(a, b)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091067#comment-15091067
 ] 

Apache Spark commented on SPARK-12177:
--

User 'nikit-os' has created a pull request for this issue:
https://github.com/apache/spark/pull/10681

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12742) org.apache.spark.sql.hive.LogicalPlanToSQLSuite failuer

2016-01-10 Thread Fei Wang (JIRA)
Fei Wang created SPARK-12742:


 Summary: org.apache.spark.sql.hive.LogicalPlanToSQLSuite failuer
 Key: SPARK-12742
 URL: https://issues.apache.org/jira/browse/SPARK-12742
 Project: Spark
  Issue Type: Bug
Reporter: Fei Wang


[info] Exception encountered when attempting to run a suite with class name: 
org.apache.spark.sql.hive.LogicalPlanToSQLSuite *** ABORTED *** (325 
milliseconds)
[info]   org.apache.spark.sql.AnalysisException: Table `t1` already exists.;
[info]   at 
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:296)
[info]   at 
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:285)
[info]   at 
org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:33)
[info]   at 
org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
[info]   at 
org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:23)
[info]   at 
org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
[info]   at 
org.apache.spark.sql.hive.LogicalPlanToSQLSuite.run(LogicalPlanToSQLSuite.scala:23)
[info]   at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
[info]   at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[info]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[info]   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12742) org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already exists

2016-01-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12742:


Assignee: Apache Spark

> org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already 
> exists
> ---
>
> Key: SPARK-12742
> URL: https://issues.apache.org/jira/browse/SPARK-12742
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Fei Wang
>Assignee: Apache Spark
>
> [info] Exception encountered when attempting to run a suite with class name: 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite *** ABORTED *** (325 
> milliseconds)
> [info]   org.apache.spark.sql.AnalysisException: Table `t1` already exists.;
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:296)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:285)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:33)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:23)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.run(LogicalPlanToSQLSuite.scala:23)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
> [info]   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
> [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
> [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
> [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [info]   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12742) org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already exists

2016-01-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12742:


Assignee: (was: Apache Spark)

> org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already 
> exists
> ---
>
> Key: SPARK-12742
> URL: https://issues.apache.org/jira/browse/SPARK-12742
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Fei Wang
>
> [info] Exception encountered when attempting to run a suite with class name: 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite *** ABORTED *** (325 
> milliseconds)
> [info]   org.apache.spark.sql.AnalysisException: Table `t1` already exists.;
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:296)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:285)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:33)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:23)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.run(LogicalPlanToSQLSuite.scala:23)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
> [info]   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
> [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
> [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
> [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [info]   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12742) org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already exists

2016-01-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091109#comment-15091109
 ] 

Apache Spark commented on SPARK-12742:
--

User 'scwf' has created a pull request for this issue:
https://github.com/apache/spark/pull/10682

> org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already 
> exists
> ---
>
> Key: SPARK-12742
> URL: https://issues.apache.org/jira/browse/SPARK-12742
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Fei Wang
>
> [info] Exception encountered when attempting to run a suite with class name: 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite *** ABORTED *** (325 
> milliseconds)
> [info]   org.apache.spark.sql.AnalysisException: Table `t1` already exists.;
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:296)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:285)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:33)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:23)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
> [info]   at 
> org.apache.spark.sql.hive.LogicalPlanToSQLSuite.run(LogicalPlanToSQLSuite.scala:23)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
> [info]   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
> [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
> [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
> [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [info]   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12722) Typo in Spark Pipeline example

2016-01-10 Thread Shagun Sodhani (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091113#comment-15091113
 ] 

Shagun Sodhani commented on SPARK-12722:


If no one is taking it up, I am willing to submit a PR.

> Typo in Spark Pipeline example
> --
>
> Key: SPARK-12722
> URL: https://issues.apache.org/jira/browse/SPARK-12722
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Tom Chan
>Priority: Trivial
>  Labels: starter
>
> There is a typo in the Pipeline example,
> http://spark.apache.org/docs/latest/ml-guide.html#example-pipeline
> Namely, the line
> val sameModel = Pipeline.load("/tmp/spark-logistic-regression-model")
> should be
> val sameModel = PipelineModel.load("/tmp/spark-logistic-regression-model")
> I was trying to do a PR but somehow there is error when I try to build the 
> documentation locally, so I hesitate to submit a PR. Someone who is already 
> contributing to documentation should be able to fix it in no time. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12743) spark.executor.memory is ignored by spark-submit in Standalone Cluster mode

2016-01-10 Thread Alan Braithwaite (JIRA)
Alan Braithwaite created SPARK-12743:


 Summary: spark.executor.memory is ignored by spark-submit in 
Standalone Cluster mode
 Key: SPARK-12743
 URL: https://issues.apache.org/jira/browse/SPARK-12743
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.6.0
Reporter: Alan Braithwaite


When using spark-submit in standalone cluster mode, `--conf 
spark.executor.memory=Xg` is ignored.  Instead, the value in 
spark-defaults.conf on the standalone master is used.

Using the legacy submission gateway as well, if that affects this (we're in the 
process of setting up the REST gateway).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12692) Scala style: check no white space before comma and colon

2016-01-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091221#comment-15091221
 ] 

Apache Spark commented on SPARK-12692:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/10686

> Scala style: check no white space before comma and colon
> 
>
> Key: SPARK-12692
> URL: https://issues.apache.org/jira/browse/SPARK-12692
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>
> We should not put a white space before `,` and `:` so let's check it.
> Because there are lots of style violation, first, I'd like to add a checker, 
> enable  and let the level `warn`.
> Then, I'd like to fix the style step by step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12692) Scala style: check no white space before comma and colon

2016-01-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091185#comment-15091185
 ] 

Apache Spark commented on SPARK-12692:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/10685

> Scala style: check no white space before comma and colon
> 
>
> Key: SPARK-12692
> URL: https://issues.apache.org/jira/browse/SPARK-12692
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>
> We should not put a white space before `,` and `:` so let's check it.
> Because there are lots of style violation, first, I'd like to add a checker, 
> enable  and let the level `warn`.
> Then, I'd like to fix the style step by step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12692) Scala style: check no white space before comma and colon

2016-01-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091158#comment-15091158
 ] 

Apache Spark commented on SPARK-12692:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/10684

> Scala style: check no white space before comma and colon
> 
>
> Key: SPARK-12692
> URL: https://issues.apache.org/jira/browse/SPARK-12692
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>
> We should not put a white space before `,` and `:` so let's check it.
> Because there are lots of style violation, first, I'd like to add a checker, 
> enable  and let the level `warn`.
> Then, I'd like to fix the style step by step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12744) Inconsistent behavior parsing JSON with unix timestamp values

2016-01-10 Thread Anatoliy Plastinin (JIRA)
Anatoliy Plastinin created SPARK-12744:
--

 Summary: Inconsistent behavior parsing JSON with unix timestamp 
values
 Key: SPARK-12744
 URL: https://issues.apache.org/jira/browse/SPARK-12744
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: Anatoliy Plastinin
Priority: Minor


Let’s have following json

{code}
val rdd = sc.parallelize("""{"ts":1452386229}""" :: Nil)
{code}

Spark sql casts int to timestamp treating int value as a number of seconds.
https://issues.apache.org/jira/browse/SPARK-11724

{code}
scala> sqlContext.read.json(rdd).select($"ts".cast(TimestampType)).show
++
|  ts|
++
|2016-01-10 01:37:...|
++
{code}

However parsing json with schema gives different result

{code}
scala> val schema = (new StructType).add("ts", TimestampType)
schema: org.apache.spark.sql.types.StructType = 
StructType(StructField(ts,TimestampType,true))

scala> sqlContext.read.schema(schema).json(rdd).show
++
|  ts|
++
|1970-01-17 20:26:...|
++
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4628) Put external projects and examples behind a build flag

2016-01-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091311#comment-15091311
 ] 

Apache Spark commented on SPARK-4628:
-

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/10688

> Put external projects and examples behind a build flag
> --
>
> Key: SPARK-4628
> URL: https://issues.apache.org/jira/browse/SPARK-4628
> Project: Spark
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Patrick Wendell
>Assignee: Josh Rosen
>
> This is something we talked about doing for convenience, but I'm escalating 
> this based on realizing today that some of our external projects depend on 
> code that is not in maven central. I.e. if one of these dependencies is taken 
> down (as happened recently with mqtt), all Spark builds will fail.
> The proposal here is simple, have a profile -Pexternal-projects that enables 
> these. This can follow the exact pattern of -Pkinesis-asl which was disabled 
> by default due to a license issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12745) Limit is not supported inside Set Operation

2016-01-10 Thread Xiao Li (JIRA)
Xiao Li created SPARK-12745:
---

 Summary: Limit is not supported inside Set Operation
 Key: SPARK-12745
 URL: https://issues.apache.org/jira/browse/SPARK-12745
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.6.0
Reporter: Xiao Li


The current SQLContext allow the following query, which is copied from a test 
case in SQLQuerySuite:
{code}
 checkAnswer(sql(
   """
 |select key from ((select * from testData limit 1)
 |  union all (select * from testData limit 1)) x limit 1
   """.stripMargin),
   Row(1)
 )
{code}

However, it is rejected in the Hive parser. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12745) Limit is not supported inside Set Operation

2016-01-10 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-12745:

Description: 
The current SQLContext allows the following query, which is copied from a test 
case in SQLQuerySuite:
{code}
 checkAnswer(sql(
   """
 |select key from ((select * from testData limit 1)
 |  union all (select * from testData limit 1)) x limit 1
   """.stripMargin),
   Row(1)
 )
{code}

However, it is rejected in the Hive parser. 

  was:
The current SQLContext allow the following query, which is copied from a test 
case in SQLQuerySuite:
{code}
 checkAnswer(sql(
   """
 |select key from ((select * from testData limit 1)
 |  union all (select * from testData limit 1)) x limit 1
   """.stripMargin),
   Row(1)
 )
{code}

However, it is rejected in the Hive parser. 


> Limit is not supported inside Set Operation
> ---
>
> Key: SPARK-12745
> URL: https://issues.apache.org/jira/browse/SPARK-12745
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Xiao Li
>
> The current SQLContext allows the following query, which is copied from a 
> test case in SQLQuerySuite:
> {code}
>  checkAnswer(sql(
>"""
>  |select key from ((select * from testData limit 1)
>  |  union all (select * from testData limit 1)) x limit 1
>""".stripMargin),
>Row(1)
>  )
> {code}
> However, it is rejected in the Hive parser. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12646) Support _HOST in kerberos principal for connecting to secure cluster

2016-01-10 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091317#comment-15091317
 ] 

Marcelo Vanzin commented on SPARK-12646:


I don't understand why you need this.

Hadoop needs it because when you bring up a large number of, for example, 
DataNodes or NameNodes, each one needs a different kerberos principal; so 
Hadoop uses {{principal/host@REALM}} to achieve that. To make configuration 
easier, they added support for the {{_HOST}} replacement so that you can 
distribute the same config file on all hosts, and each one would replace 
{{_HOST}} with its own hostname to create a unique kerberos principal.

Spark's case is completely different. Here it's the user's principal and 
keytab; they user running the application. The probability of a user needing 
any of the above is close to zero.

Do you have a use case for this that you have not explained? Or do you just 
want to copy a feature from Hadoop that doesn't really make a lot of sense in 
Spark?

> Support _HOST in kerberos principal for connecting to secure cluster
> 
>
> Key: SPARK-12646
> URL: https://issues.apache.org/jira/browse/SPARK-12646
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Hari Krishna Dara
>Priority: Minor
>  Labels: security
>
> Hadoop supports _HOST as a token that is dynamically replaced with the actual 
> hostname at the time the kerberos authentication is done. This is supported 
> in many hadoop stacks including YARN. When configuring Spark to connect to 
> secure cluster (e.g., yarn-cluster or yarn-client as master), it would be 
> natural to extend support for this token to Spark as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12744) Inconsistent behavior parsing JSON with unix timestamp values

2016-01-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12744:


Assignee: Apache Spark

> Inconsistent behavior parsing JSON with unix timestamp values
> -
>
> Key: SPARK-12744
> URL: https://issues.apache.org/jira/browse/SPARK-12744
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Anatoliy Plastinin
>Assignee: Apache Spark
>Priority: Minor
>
> Let’s have following json
> {code}
> val rdd = sc.parallelize("""{"ts":1452386229}""" :: Nil)
> {code}
> Spark sql casts int to timestamp treating int value as a number of seconds.
> https://issues.apache.org/jira/browse/SPARK-11724
> {code}
> scala> sqlContext.read.json(rdd).select($"ts".cast(TimestampType)).show
> ++
> |  ts|
> ++
> |2016-01-10 01:37:...|
> ++
> {code}
> However parsing json with schema gives different result
> {code}
> scala> val schema = (new StructType).add("ts", TimestampType)
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(ts,TimestampType,true))
> scala> sqlContext.read.schema(schema).json(rdd).show
> ++
> |  ts|
> ++
> |1970-01-17 20:26:...|
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12744) Inconsistent behavior parsing JSON with unix timestamp values

2016-01-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12744:


Assignee: (was: Apache Spark)

> Inconsistent behavior parsing JSON with unix timestamp values
> -
>
> Key: SPARK-12744
> URL: https://issues.apache.org/jira/browse/SPARK-12744
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Anatoliy Plastinin
>Priority: Minor
>
> Let’s have following json
> {code}
> val rdd = sc.parallelize("""{"ts":1452386229}""" :: Nil)
> {code}
> Spark sql casts int to timestamp treating int value as a number of seconds.
> https://issues.apache.org/jira/browse/SPARK-11724
> {code}
> scala> sqlContext.read.json(rdd).select($"ts".cast(TimestampType)).show
> ++
> |  ts|
> ++
> |2016-01-10 01:37:...|
> ++
> {code}
> However parsing json with schema gives different result
> {code}
> scala> val schema = (new StructType).add("ts", TimestampType)
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(ts,TimestampType,true))
> scala> sqlContext.read.schema(schema).json(rdd).show
> ++
> |  ts|
> ++
> |1970-01-17 20:26:...|
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12744) Inconsistent behavior parsing JSON with unix timestamp values

2016-01-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091268#comment-15091268
 ] 

Apache Spark commented on SPARK-12744:
--

User 'antlypls' has created a pull request for this issue:
https://github.com/apache/spark/pull/10687

> Inconsistent behavior parsing JSON with unix timestamp values
> -
>
> Key: SPARK-12744
> URL: https://issues.apache.org/jira/browse/SPARK-12744
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Anatoliy Plastinin
>Priority: Minor
>
> Let’s have following json
> {code}
> val rdd = sc.parallelize("""{"ts":1452386229}""" :: Nil)
> {code}
> Spark sql casts int to timestamp treating int value as a number of seconds.
> https://issues.apache.org/jira/browse/SPARK-11724
> {code}
> scala> sqlContext.read.json(rdd).select($"ts".cast(TimestampType)).show
> ++
> |  ts|
> ++
> |2016-01-10 01:37:...|
> ++
> {code}
> However parsing json with schema gives different result
> {code}
> scala> val schema = (new StructType).add("ts", TimestampType)
> schema: org.apache.spark.sql.types.StructType = 
> StructType(StructField(ts,TimestampType,true))
> scala> sqlContext.read.schema(schema).json(rdd).show
> ++
> |  ts|
> ++
> |1970-01-17 20:26:...|
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10359) Enumerate Spark's dependencies in a file and diff against it for new pull requests

2016-01-10 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10359:
---
Fix Version/s: 1.6.1

> Enumerate Spark's dependencies in a file and diff against it for new pull 
> requests 
> ---
>
> Key: SPARK-10359
> URL: https://issues.apache.org/jira/browse/SPARK-10359
> Project: Spark
>  Issue Type: New Feature
>  Components: Build, Project Infra
>Reporter: Patrick Wendell
>Assignee: Josh Rosen
> Fix For: 1.6.1, 2.0.0
>
>
> Sometimes when we have dependency changes it can be pretty unclear what 
> transitive set of things are changing. If we enumerate all of the 
> dependencies and put them in a source file in the repo, we can make it so 
> that it is very explicit what is changing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12745) Limit is not supported inside Set Operation

2016-01-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12745:


Assignee: Apache Spark

> Limit is not supported inside Set Operation
> ---
>
> Key: SPARK-12745
> URL: https://issues.apache.org/jira/browse/SPARK-12745
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> The current SQLContext allows the following query, which is copied from a 
> test case in SQLQuerySuite:
> {code}
>  checkAnswer(sql(
>"""
>  |select key from ((select * from testData limit 1)
>  |  union all (select * from testData limit 1)) x limit 1
>""".stripMargin),
>Row(1)
>  )
> {code}
> However, it is rejected in the Hive parser. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12745) Limit is not supported inside Set Operation

2016-01-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12745:


Assignee: (was: Apache Spark)

> Limit is not supported inside Set Operation
> ---
>
> Key: SPARK-12745
> URL: https://issues.apache.org/jira/browse/SPARK-12745
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Xiao Li
>
> The current SQLContext allows the following query, which is copied from a 
> test case in SQLQuerySuite:
> {code}
>  checkAnswer(sql(
>"""
>  |select key from ((select * from testData limit 1)
>  |  union all (select * from testData limit 1)) x limit 1
>""".stripMargin),
>Row(1)
>  )
> {code}
> However, it is rejected in the Hive parser. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org