date:20160509

[jira] [Closed] (SPARK-15216) Add a new Dataset API explainCodegen

2016-05-09 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin closed SPARK-15216.
---
Resolution: Won't Fix

Closing this because I don't think it makes sense to have a top level method 
for something so developer facing. Otherwise the public APIs will be littered 
with internal developer facing methods.


> Add a new Dataset API explainCodegen
> 
>
> Key: SPARK-15216
> URL: https://issues.apache.org/jira/browse/SPARK-15216
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> {noformat}
> val ds = Seq("a" -> 1, "a" -> 3, "b" -> 3).toDS().groupByKey(_._1).agg(
>   expr("avg(_2)").as[Double],
>   ComplexResultAgg.toColumn)
> ds.explainCodegen()
> {noformat}
> Reading codegen output is important for developers to debug. So far, 
> outputting codegen results is available in the SQL interface by `EXPLAIN 
> CODEGEN`. However, in the Dataset/DataFrame APIs, we face the same issue. We 
> can add a new API for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-15210) Add missing @DeveloperApi annotation in sql.types

2016-05-09 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15210.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Add missing @DeveloperApi annotation in sql.types
> -
>
> Key: SPARK-15210
> URL: https://issues.apache.org/jira/browse/SPARK-15210
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Trivial
> Fix For: 2.0.0
>
>
> @DeveloperApi annotation for {{AbstractDataType}} {{MapType}} 
> {{UserDefinedType}} are missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14341) Throw exception on unsupported Create/Drop Macro DDL commands

2016-05-09 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-14341:

Component/s: SQL

> Throw exception on unsupported Create/Drop Macro DDL commands
> -
>
> Key: SPARK-14341
> URL: https://issues.apache.org/jira/browse/SPARK-14341
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Bo Meng
>Assignee: Bo Meng
>Priority: Minor
>
> According to
> [SPARK-14123|https://issues.apache.org/jira/browse/SPARK-14123], we need to 
> throw exception for Create/Drop Macro DDL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10653) Remove unnecessary things from SparkEnv

2016-05-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10653:


Assignee: Apache Spark

> Remove unnecessary things from SparkEnv
> ---
>
> Key: SPARK-10653
> URL: https://issues.apache.org/jira/browse/SPARK-10653
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Apache Spark
>
> As of the writing of this message, there are at least two things that can be 
> removed from it:
> {code}
> @DeveloperApi
> class SparkEnv (
> val executorId: String,
> private[spark] val rpcEnv: RpcEnv,
> val serializer: Serializer,
> val closureSerializer: Serializer,
> val cacheManager: CacheManager,
> val mapOutputTracker: MapOutputTracker,
> val shuffleManager: ShuffleManager,
> val broadcastManager: BroadcastManager,
> val blockTransferService: BlockTransferService, // this one can go
> val blockManager: BlockManager,
> val securityManager: SecurityManager,
> val httpFileServer: HttpFileServer,
> val sparkFilesDir: String, // this one maybe? It's only used in 1 place.
> val metricsSystem: MetricsSystem,
> val shuffleMemoryManager: ShuffleMemoryManager,
> val executorMemoryManager: ExecutorMemoryManager, // this can go
> val outputCommitCoordinator: OutputCommitCoordinator,
> val conf: SparkConf) extends Logging {
>   ...
> }
> {code}
> We should avoid adding to this infinite list of things in SparkEnv's 
> constructors if they're not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10653) Remove unnecessary things from SparkEnv

2016-05-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10653:


Assignee: (was: Apache Spark)

> Remove unnecessary things from SparkEnv
> ---
>
> Key: SPARK-10653
> URL: https://issues.apache.org/jira/browse/SPARK-10653
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>
> As of the writing of this message, there are at least two things that can be 
> removed from it:
> {code}
> @DeveloperApi
> class SparkEnv (
> val executorId: String,
> private[spark] val rpcEnv: RpcEnv,
> val serializer: Serializer,
> val closureSerializer: Serializer,
> val cacheManager: CacheManager,
> val mapOutputTracker: MapOutputTracker,
> val shuffleManager: ShuffleManager,
> val broadcastManager: BroadcastManager,
> val blockTransferService: BlockTransferService, // this one can go
> val blockManager: BlockManager,
> val securityManager: SecurityManager,
> val httpFileServer: HttpFileServer,
> val sparkFilesDir: String, // this one maybe? It's only used in 1 place.
> val metricsSystem: MetricsSystem,
> val shuffleMemoryManager: ShuffleMemoryManager,
> val executorMemoryManager: ExecutorMemoryManager, // this can go
> val outputCommitCoordinator: OutputCommitCoordinator,
> val conf: SparkConf) extends Logging {
>   ...
> }
> {code}
> We should avoid adding to this infinite list of things in SparkEnv's 
> constructors if they're not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14499) Add tests to make sure drop partitions of an external table will not delete data

2016-05-09 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-14499.
-
   Resolution: Resolved
Fix Version/s: 2.0.0

> Add tests to make sure drop partitions of an external table will not delete 
> data
> 
>
> Key: SPARK-14499
> URL: https://issues.apache.org/jira/browse/SPARK-14499
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Xiao Li
> Fix For: 2.0.0
>
>
> This is a follow-up of SPARK-14132 
> (https://github.com/apache/spark/pull/12220#issuecomment-207625166) to 
> address https://github.com/apache/spark/pull/12220#issuecomment-207612627.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-14499) Add tests to make sure drop partitions of an external table will not delete data

2016-05-09 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reopened SPARK-14499:
-

change the status

> Add tests to make sure drop partitions of an external table will not delete 
> data
> 
>
> Key: SPARK-14499
> URL: https://issues.apache.org/jira/browse/SPARK-14499
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Xiao Li
> Fix For: 2.0.0
>
>
> This is a follow-up of SPARK-14132 
> (https://github.com/apache/spark/pull/12220#issuecomment-207625166) to 
> address https://github.com/apache/spark/pull/12220#issuecomment-207612627.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10653) Remove unnecessary things from SparkEnv

2016-05-09 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276764#comment-15276764
 ] 

Apache Spark commented on SPARK-10653:
--

User 'ajbozarth' has created a pull request for this issue:
https://github.com/apache/spark/pull/12970

> Remove unnecessary things from SparkEnv
> ---
>
> Key: SPARK-10653
> URL: https://issues.apache.org/jira/browse/SPARK-10653
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>
> As of the writing of this message, there are at least two things that can be 
> removed from it:
> {code}
> @DeveloperApi
> class SparkEnv (
> val executorId: String,
> private[spark] val rpcEnv: RpcEnv,
> val serializer: Serializer,
> val closureSerializer: Serializer,
> val cacheManager: CacheManager,
> val mapOutputTracker: MapOutputTracker,
> val shuffleManager: ShuffleManager,
> val broadcastManager: BroadcastManager,
> val blockTransferService: BlockTransferService, // this one can go
> val blockManager: BlockManager,
> val securityManager: SecurityManager,
> val httpFileServer: HttpFileServer,
> val sparkFilesDir: String, // this one maybe? It's only used in 1 place.
> val metricsSystem: MetricsSystem,
> val shuffleMemoryManager: ShuffleMemoryManager,
> val executorMemoryManager: ExecutorMemoryManager, // this can go
> val outputCommitCoordinator: OutputCommitCoordinator,
> val conf: SparkConf) extends Logging {
>   ...
> }
> {code}
> We should avoid adding to this infinite list of things in SparkEnv's 
> constructors if they're not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-15166) Move hive-specific conf setting from SparkSession

2016-05-09 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15166.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Move hive-specific conf setting from SparkSession
> -
>
> Key: SPARK-15166
> URL: https://issues.apache.org/jira/browse/SPARK-15166
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-15199) Disallow Dropping Build-in Functions

2016-05-09 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-15199.
-
   Resolution: Resolved
Fix Version/s: 2.0.0

> Disallow Dropping Build-in Functions
> 
>
> Key: SPARK-15199
> URL: https://issues.apache.org/jira/browse/SPARK-15199
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
> Fix For: 2.0.0
>
>
> As Hive and the major RDBMS behaves, the built-in functions are not allowed 
> to drop. In the current implementation, users can drop the built-in 
> functions. However, after dropping the built-in functions, users are unable 
> to add them back. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15210) Add missing @DeveloperApi annotation in sql.types

2016-05-09 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15210:
--
Assignee: zhengruifeng

> Add missing @DeveloperApi annotation in sql.types
> -
>
> Key: SPARK-15210
> URL: https://issues.apache.org/jira/browse/SPARK-15210
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Trivial
> Fix For: 2.0.0
>
>
> @DeveloperApi annotation for {{AbstractDataType}} {{MapType}} 
> {{UserDefinedType}} are missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-15220) Add hyperlink to "running application" and "completed application"

2016-05-09 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15220.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Add hyperlink to "running application" and "completed application"
> --
>
> Key: SPARK-15220
> URL: https://issues.apache.org/jira/browse/SPARK-15220
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Mao, Wei
>Priority: Minor
> Fix For: 2.0.0
>
>
> Add hyperlink to "running application" and "completed application", so user 
> can jump to application table directly, In my environment, I set up 1000+ 
> works and it's painful to scroll down to skip worker list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14946) Spark 2.0 vs 1.6.1 Query Time(out)

2016-05-09 Thread Davies Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276743#comment-15276743
 ] 

Davies Liu commented on SPARK-14946:


[~raymond.honderd...@sizmek.com] It seems that the second job (scan the bigger 
table) did not get started, could you try to disable the broadcast join by set 
spark.sql.autoBroadcastJoinThreshold to 0?

> Spark 2.0 vs 1.6.1 Query Time(out)
> --
>
> Key: SPARK-14946
> URL: https://issues.apache.org/jira/browse/SPARK-14946
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Raymond Honderdors
>Priority: Critical
> Attachments: Query Plan 1.6.1.png, screenshot-spark_2.0.png, 
> spark-defaults.conf, spark-env.sh, version 1.6.1 screen 1 - thrift collect = 
> true.png, version 1.6.1 screen 1 thrift collect = false.png, version 1.6.1 
> screen 2 thrift collect =false.png, version 2.0 -screen 1 thrift collect = 
> false.png, version 2.0 screen 2 thrift collect = true.png, versiuon 2.0 
> screen 1 thrift collect = true.png
>
>
> I run a query using JDBC driver running it on version 1.6.1 it return after 5 
> – 6 min , the same query against version 2.0 fails after 2h (due to timeout) 
> for details on how to reproduce (also see comments below)
> here is what I tried
> I run the following query: select * from pe_servingdata sd inner join 
> pe_campaigns_gzip c on sd.campaignid = c.campaign_id ;
> (with and without a counter and group by on campaigne_id)
> I run spark 1.6.1 and Thriftserver
> then running the sql from beeline or squirrel, after a few min I get answer 
> (0 row) it is correct due to the fact my data did not have matching campaign 
> ids in both tables
> when I run spark 2.0 and Thriftserver, I once again run the sql statement and 
> after 2:30 min it gives up, bit already after 30/60 sec I stop seeing 
> activity on the spark ui
> (sorry for the delay in competing the description of the bug, I was on and 
> off work due to national holidays)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-15067) YARN executors are launched with fixed perm gen size

2016-05-09 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15067.
---
  Resolution: Fixed
Assignee: Sean Owen
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> YARN executors are launched with fixed perm gen size
> 
>
> Key: SPARK-15067
> URL: https://issues.apache.org/jira/browse/SPARK-15067
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0, 1.6.1
>Reporter: Renato Falchi Brandão
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 2.0.0
>
>
> It is impossible to change the executors max perm gen size using the property 
> "spark.executor.extraJavaOptions" when you are running on YARN.
> When the JVM option "-XX:MaxPermSize" is set through the property 
> "spark.executor.extraJavaOptions", Spark put it properly in the shell command 
> that will start the JVM container but, in the ending of command, it sets 
> again this option using a fixed value of 256m, as you can see in the log I've 
> extracted:
> 2016-04-30 17:20:12 INFO  ExecutorRunnable:58 -
> ===
> YARN executor launch context:
>   env:
> CLASSPATH -> 
> {{PWD}}{{PWD}}/__spark__.jar$HADOOP_CONF_DIR/usr/hdp/current/hadoop-client/*/usr/hdp/current/hadoop-client/lib/*/usr/hdp/current/hadoop-hdfs-client/*/usr/hdp/current/hadoop-hdfs-client/lib/*/usr/hdp/current/hadoop-yarn-client/*/usr/hdp/current/hadoop-yarn-client/lib/*/usr/hdp/mr-framework/hadoop/share/hadoop/mapreduce/*:/usr/hdp/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/common/*:/usr/hdp/mr-framework/hadoop/share/hadoop/common/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/yarn/*:/usr/hdp/mr-framework/hadoop/share/hadoop/yarn/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/hdfs/*:/usr/hdp/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/current/hadoop/lib/hadoop-lzo-0.6.0.jar:/etc/hadoop/conf/secure
> SPARK_LOG_URL_STDERR -> 
> http://x0668sl.x.br:8042/node/containerlogs/container_1456962126505_329993_01_02/h_loadbd/stderr?start=-4096
> SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1456962126505_329993
> SPARK_YARN_CACHE_FILES_FILE_SIZES -> 191719054,166
> SPARK_USER -> h_loadbd
> SPARK_YARN_CACHE_FILES_VISIBILITIES -> PUBLIC,PUBLIC
> SPARK_YARN_MODE -> true
> SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1459806496093,1459808508343
> SPARK_LOG_URL_STDOUT -> 
> http://x0668sl.x.br:8042/node/containerlogs/container_1456962126505_329993_01_02/h_loadbd/stdout?start=-4096
> SPARK_YARN_CACHE_FILES -> 
> hdfs://x/user/datalab/hdp/spark/lib/spark-assembly-1.6.0.2.3.4.1-10-hadoop2.7.1.2.3.4.1-10.jar#__spark__.jar,hdfs://tlvcluster/user/datalab/hdp/spark/conf/hive-site.xml#hive-site.xml
>   command:
> {{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms6144m 
> -Xmx6144m '-XX:+PrintGCDetails' '-XX:MaxPermSize=1024M' 
> '-XX:+PrintGCTimeStamps' -Djava.io.tmpdir={{PWD}}/tmp 
> '-Dspark.akka.timeout=30' '-Dspark.driver.port=62875' 
> '-Dspark.rpc.askTimeout=30' '-Dspark.rpc.lookupTimeout=30' 
> -Dspark.yarn.app.container.log.dir= -XX:MaxPermSize=256m 
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url 
> spark://CoarseGrainedScheduler@10.125.81.42:62875 --executor-id 1 --hostname 
> x0668sl.x.br --cores 1 --app-id application_1456962126505_329993 
> --user-class-path file:$PWD/__app__.jar 1> /stdout 2> 
> /stderr
> Analyzing the code is possible to see that all the options set in the 
> property "spark.executor.extraJavaOptions" are enclosed, one by one, in 
> single quotes (ExecutorRunnable.scala:151) before the launcher take the 
> decision if a default value has to be provided or not for the option 
> "-XX:MaxPermSize" (ExecutorRunnable.scala:202).
> This decision is taken examining all the options set and looking for a string 
> starting with the value "-XX:MaxPermSize" (CommandBuilderUtils.java:328). If 
> that value is not found, the default value is set.
> A string option starting without single quote will never be found, then, a 
> default value will always be provided.
> A possible solution is change the source code of CommandBuilderUtils.java in 
> the line 328:
> From-> if (arg.startsWith("-XX:MaxPermSize="))
> To-> if (arg.indexOf("-XX:MaxPermSize=") > -1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-15225) Replace SQLContext with SparkSession in Encoder documentation

2016-05-09 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15225.
---
  Resolution: Fixed
Assignee: Liang-Chi Hsieh
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Replace SQLContext with SparkSession in Encoder documentation
> -
>
> Key: SPARK-15225
> URL: https://issues.apache.org/jira/browse/SPARK-15225
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Minor
> Fix For: 2.0.0
>
>
> Encoder's doc mentions sqlContext.implicits._. We should use 
> sparkSession.implicits._ instead now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15223:
--
Assignee: Philipp Hoffmann

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Assignee: Philipp Hoffmann
>Priority: Minor
> Fix For: 1.6.2, 2.0.0
>
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15223:
--
Target Version/s: 1.6.2, 2.0.0  (was: 2.0.0)

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Assignee: Philipp Hoffmann
>Priority: Minor
> Fix For: 1.6.2, 2.0.0
>
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15223:
--
Fix Version/s: 1.6.2

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Assignee: Philipp Hoffmann
>Priority: Minor
> Fix For: 1.6.2, 2.0.0
>
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15223.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Priority: Trivial
> Fix For: 2.0.0
>
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-15223:
--
Priority: Minor  (was: Trivial)

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Priority: Minor
> Fix For: 2.0.0
>
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11293) Spillable collections leak shuffle memory

2016-05-09 Thread Miles Crawford (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276723#comment-15276723
 ] 

Miles Crawford commented on SPARK-11293:


Also biting us in 1.6.1 - we have to repartition our dataset into many 
thousands of partitions to avoid the following stack:

{code}
2016-05-08 16:05:11,941 ERROR org.apache.spark.executor.Executor: Managed 
memory leak detected; size = 5748783225 bytes, TID = 1283
2016-05-08 16:05:11,948 ERROR org.apache.spark.executor.Executor: Exception in 
task 116.4 in stage 1.0 (TID 1283)
java.lang.OutOfMemoryError: Unable to acquire 2383 bytes of memory, got 0
at 
org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120) 
~[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:346)
 
~[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:367)
 
~[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:237)
 
~[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164)
 
~[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) 
~[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
~[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at org.apache.spark.scheduler.Task.run(Task.scala:89) 
~[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
~[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_91]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_91]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]{code}

> Spillable collections leak shuffle memory
> -
>
> Key: SPARK-11293
> URL: https://issues.apache.org/jira/browse/SPARK-11293
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1, 1.4.1, 1.5.1, 1.6.0, 1.6.1
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Critical
>
> I discovered multiple leaks of shuffle memory while working on my memory 
> manager consolidation patch, which added the ability to do strict memory leak 
> detection for the bookkeeping that used to be performed by the 
> ShuffleMemoryManager. This uncovered a handful of places where tasks can 
> acquire execution/shuffle memory but never release it, starving themselves of 
> memory.
> Problems that I found:
> * {{ExternalSorter.stop()}} should release the sorter's shuffle/execution 
> memory.
> * BlockStoreShuffleReader should call {{ExternalSorter.stop()}} using a 
> {{CompletionIterator}}.
> * {{ExternalAppendOnlyMap}} exposes no equivalent of {{stop()}} for freeing 
> its resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-15093) create/delete/rename directory for InMemoryCatalog operations if needed

2016-05-09 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15093.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> create/delete/rename directory for InMemoryCatalog operations if needed
> ---
>
> Key: SPARK-15093
> URL: https://issues.apache.org/jira/browse/SPARK-15093
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15221) error: not found: value sqlContext when starting Spark 1.6.1

2016-05-09 Thread Vijay Parmar (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276693#comment-15276693
 ] 

Vijay Parmar commented on SPARK-15221:
--

Thank you, Sean!

I am new to Spark and learning the things. Still not sure about this issue as 
when I ran the command to execute spark all this log gets generated.
Will take care of the things you have pointed out. 

> error: not found: value sqlContext when starting Spark 1.6.1
> 
>
> Key: SPARK-15221
> URL: https://issues.apache.org/jira/browse/SPARK-15221
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.0.4, 8 GB RAM, 1 Processor
>Reporter: Vijay Parmar
>Priority: Blocker
>  Labels: build, newbie
>
> When I start Spark (version 1.6.1), at the very end I am getting the 
> following error message:
> :16: error: not found: value sqlContext
>  import sqlContext.implicits._
> ^
> :16: error: not found: value sqlContext
>  import sqlContext.sql
> I have gone through some content on the web about editing the /.bashrc file 
> and including the "SPARK_LOCAL_IP=127.0.0.1" under SPARK variables. 
> Also tried editing the /etc/hosts file with :-
>  $ sudo vi /etc/hosts
>  ...
>  127.0.0.1  
>  ...
> but still the issue persists.  Is it the issue with the build or something 
> else?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15220) Add hyperlink to "running application" and "completed application"

2016-05-09 Thread Alex Bozarth (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276687#comment-15276687
 ] 

Alex Bozarth commented on SPARK-15220:
--

I'll take a look at this in my free moments today, seems quick

> Add hyperlink to "running application" and "completed application"
> --
>
> Key: SPARK-15220
> URL: https://issues.apache.org/jira/browse/SPARK-15220
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Mao, Wei
>Priority: Minor
>
> Add hyperlink to "running application" and "completed application", so user 
> can jump to application table directly, In my environment, I set up 1000+ 
> works and it's painful to scroll down to skip worker list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15227) InputStream stop-start semantics + empty implementations

2016-05-09 Thread Stas Levin (JIRA)

Stas Levin created SPARK-15227:
--

 Summary: InputStream stop-start semantics + empty implementations
 Key: SPARK-15227
 URL: https://issues.apache.org/jira/browse/SPARK-15227
 Project: Spark
  Issue Type: Improvement
  Components: Input/Output, Streaming
Affects Versions: 1.6.1
Reporter: Stas Levin
Priority: Minor


Hi,

Seems like quite a few InputStream(s) currently leave the start and stop 
methods empty. 

I was hoping to hear your thoughts on:

1. Whether there were any particular reasons to leave these methods empty ?
2. Do the stop/start semantics of InputStream(s) aim to support pause-resume 
use-cases, or is it a one way ticket? 

A pause-resume kind of thing could be really useful for cases where one wishes 
to load new offline data for the streaming app to run on top of, without 
restarting the entire app.

Thanks a lot,
Stas



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14898) MultivariateGaussian could use Cholesky in calculateCovarianceConstants

2016-05-09 Thread Miao Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276632#comment-15276632
 ] 

Miao Wang commented on SPARK-14898:
---

I agree. I tried to understand what I should do in this JIRA. It seems that we 
don't have to change anything in terms of using Cholesky. Thanks!

> MultivariateGaussian could use Cholesky in calculateCovarianceConstants
> ---
>
> Key: SPARK-14898
> URL: https://issues.apache.org/jira/browse/SPARK-14898
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> In spark.ml.stat.distribution.MultivariateGaussian, 
> calculateCovarianceConstants uses SVD.  It might be more efficient to use 
> Cholesky.  We should check other numerical libraries and see if we should 
> switch to Cholesky.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-15122) TPC-DS Qury 41 fails with The correlated scalar subquery can only contain equality predicates

2016-05-09 Thread JESSE CHEN (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JESSE CHEN closed SPARK-15122.
--

Verified successfully in 0508 build. Thanks!

> TPC-DS Qury 41 fails with The correlated scalar subquery can only contain 
> equality predicates
> -
>
> Key: SPARK-15122
> URL: https://issues.apache.org/jira/browse/SPARK-15122
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: JESSE CHEN
>Assignee: Herman van Hovell
>Priority: Critical
> Fix For: 2.0.0
>
>
> The official TPC-DS query 41 fails with the following error:
> {noformat}
> Error in query: The correlated scalar subquery can only contain equality 
> predicates: (((i_manufact#38 = i_manufact#16) && (i_category#36 = Women) 
> && ((i_color#41 = powder) || (i_color#41 = khaki))) && (((i_units#42 = Ounce) 
> || (i_units#42 = Oz)) && ((i_size#39 = medium) || (i_size#39 = extra 
> large || (((i_category#36 = Women) && ((i_color#41 = brown) || 
> (i_color#41 = honeydew))) && (((i_units#42 = Bunch) || (i_units#42 = Ton)) && 
> ((i_size#39 = N/A) || (i_size#39 = small) || i_category#36 = Men) && 
> ((i_color#41 = floral) || (i_color#41 = deep))) && (((i_units#42 = N/A) || 
> (i_units#42 = Dozen)) && ((i_size#39 = petite) || (i_size#39 = large || 
> (((i_category#36 = Men) && ((i_color#41 = light) || (i_color#41 = 
> cornflower))) && (((i_units#42 = Box) || (i_units#42 = Pound)) && ((i_size#39 
> = medium) || (i_size#39 = extra large))) || ((i_manufact#38 = 
> i_manufact#16) && (i_category#36 = Women) && ((i_color#41 = midnight) || 
> (i_color#41 = snow))) && (((i_units#42 = Pallet) || (i_units#42 = Gross)) && 
> ((i_size#39 = medium) || (i_size#39 = extra large || (((i_category#36 = 
> Women) && ((i_color#41 = cyan) || (i_color#41 = papaya))) && (((i_units#42 = 
> Cup) || (i_units#42 = Dram)) && ((i_size#39 = N/A) || (i_size#39 = small) 
> || i_category#36 = Men) && ((i_color#41 = orange) || (i_color#41 = 
> frosted))) && (((i_units#42 = Each) || (i_units#42 = Tbl)) && ((i_size#39 = 
> petite) || (i_size#39 = large || (((i_category#36 = Men) && ((i_color#41 
> = forest) || (i_color#41 = ghost))) && (((i_units#42 = Lb) || (i_units#42 = 
> Bundle)) && ((i_size#39 = medium) || (i_size#39 = extra large;
> {noformat}
> The output plans showed the following errors
> {noformat}
> == Parsed Logical Plan ==
> 'GlobalLimit 100
> +- 'LocalLimit 100
>+- 'Sort ['i_product_name ASC], true
>   +- 'Distinct
>  +- 'Project ['i_product_name]
> +- 'Filter ((('i_manufact_id >= 738) && ('i_manufact_id <= (738 + 
> 40))) && (scalar-subquery#1 [] > 0))
>:  +- 'SubqueryAlias scalar-subquery#1 []
>: +- 'Project ['count(1) AS item_cnt#0]
>:+- 'Filter ((('i_manufact = 'i1.i_manufact) && 
> ('i_category = Women) && (('i_color = powder) || ('i_color = khaki))) && 
> ((('i_units = Ounce) || ('i_units = Oz)) && (('i_size = medium) || ('i_size = 
> extra large || ((('i_category = Women) && (('i_color = brown) || 
> ('i_color = honeydew))) && ((('i_units = Bunch) || ('i_units = Ton)) && 
> (('i_size = N/A) || ('i_size = small) || 'i_category = Men) && 
> (('i_color = floral) || ('i_color = deep))) && ((('i_units = N/A) || 
> ('i_units = Dozen)) && (('i_size = petite) || ('i_size = large || 
> ((('i_category = Men) && (('i_color = light) || ('i_color = cornflower))) && 
> ((('i_units = Box) || ('i_units = Pound)) && (('i_size = medium) || ('i_size 
> = extra large))) || (('i_manufact = 'i1.i_manufact) && ('i_category = 
> Women) && (('i_color = midnight) || ('i_color = snow))) && ((('i_units = 
> Pallet) || ('i_units = Gross)) && (('i_size = medium) || ('i_size = extra 
> large || ((('i_category = Women) && (('i_color = cyan) || ('i_color = 
> papaya))) && ((('i_units = Cup) || ('i_units = Dram)) && (('i_size = N/A) || 
> ('i_size = small) || 'i_category = Men) && (('i_color = orange) || 
> ('i_color = frosted))) && ((('i_units = Each) || ('i_units = Tbl)) && 
> (('i_size = petite) || ('i_size = large || ((('i_category = Men) && 
> (('i_color = forest) || ('i_color = ghost))) && ((('i_units = Lb) || 
> ('i_units = Bundle)) && (('i_size = medium) || ('i_size = extra large
>:   +- 'UnresolvedRelation `item`, None
>+- 'UnresolvedRelation `item`, Some(i1)
> == Analyzed Logical Plan ==
> i_product_name: string
> GlobalLimit 100
> +- LocalLimit 100
>+- Sort [i_product_name#24 ASC], true
>   +- Distinct
>  +- Project [i_product_name#24]
> +- Filter (((i_manufact_id#16L >= cast(738 as bigint)) && 
>

[jira] [Commented] (SPARK-15122) TPC-DS Qury 41 fails with The correlated scalar subquery can only contain equality predicates

2016-05-09 Thread JESSE CHEN (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276600#comment-15276600
 ] 

JESSE CHEN commented on SPARK-15122:


works great! now all 99 queries pass! nicely done!

> TPC-DS Qury 41 fails with The correlated scalar subquery can only contain 
> equality predicates
> -
>
> Key: SPARK-15122
> URL: https://issues.apache.org/jira/browse/SPARK-15122
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: JESSE CHEN
>Assignee: Herman van Hovell
>Priority: Critical
> Fix For: 2.0.0
>
>
> The official TPC-DS query 41 fails with the following error:
> {noformat}
> Error in query: The correlated scalar subquery can only contain equality 
> predicates: (((i_manufact#38 = i_manufact#16) && (i_category#36 = Women) 
> && ((i_color#41 = powder) || (i_color#41 = khaki))) && (((i_units#42 = Ounce) 
> || (i_units#42 = Oz)) && ((i_size#39 = medium) || (i_size#39 = extra 
> large || (((i_category#36 = Women) && ((i_color#41 = brown) || 
> (i_color#41 = honeydew))) && (((i_units#42 = Bunch) || (i_units#42 = Ton)) && 
> ((i_size#39 = N/A) || (i_size#39 = small) || i_category#36 = Men) && 
> ((i_color#41 = floral) || (i_color#41 = deep))) && (((i_units#42 = N/A) || 
> (i_units#42 = Dozen)) && ((i_size#39 = petite) || (i_size#39 = large || 
> (((i_category#36 = Men) && ((i_color#41 = light) || (i_color#41 = 
> cornflower))) && (((i_units#42 = Box) || (i_units#42 = Pound)) && ((i_size#39 
> = medium) || (i_size#39 = extra large))) || ((i_manufact#38 = 
> i_manufact#16) && (i_category#36 = Women) && ((i_color#41 = midnight) || 
> (i_color#41 = snow))) && (((i_units#42 = Pallet) || (i_units#42 = Gross)) && 
> ((i_size#39 = medium) || (i_size#39 = extra large || (((i_category#36 = 
> Women) && ((i_color#41 = cyan) || (i_color#41 = papaya))) && (((i_units#42 = 
> Cup) || (i_units#42 = Dram)) && ((i_size#39 = N/A) || (i_size#39 = small) 
> || i_category#36 = Men) && ((i_color#41 = orange) || (i_color#41 = 
> frosted))) && (((i_units#42 = Each) || (i_units#42 = Tbl)) && ((i_size#39 = 
> petite) || (i_size#39 = large || (((i_category#36 = Men) && ((i_color#41 
> = forest) || (i_color#41 = ghost))) && (((i_units#42 = Lb) || (i_units#42 = 
> Bundle)) && ((i_size#39 = medium) || (i_size#39 = extra large;
> {noformat}
> The output plans showed the following errors
> {noformat}
> == Parsed Logical Plan ==
> 'GlobalLimit 100
> +- 'LocalLimit 100
>+- 'Sort ['i_product_name ASC], true
>   +- 'Distinct
>  +- 'Project ['i_product_name]
> +- 'Filter ((('i_manufact_id >= 738) && ('i_manufact_id <= (738 + 
> 40))) && (scalar-subquery#1 [] > 0))
>:  +- 'SubqueryAlias scalar-subquery#1 []
>: +- 'Project ['count(1) AS item_cnt#0]
>:+- 'Filter ((('i_manufact = 'i1.i_manufact) && 
> ('i_category = Women) && (('i_color = powder) || ('i_color = khaki))) && 
> ((('i_units = Ounce) || ('i_units = Oz)) && (('i_size = medium) || ('i_size = 
> extra large || ((('i_category = Women) && (('i_color = brown) || 
> ('i_color = honeydew))) && ((('i_units = Bunch) || ('i_units = Ton)) && 
> (('i_size = N/A) || ('i_size = small) || 'i_category = Men) && 
> (('i_color = floral) || ('i_color = deep))) && ((('i_units = N/A) || 
> ('i_units = Dozen)) && (('i_size = petite) || ('i_size = large || 
> ((('i_category = Men) && (('i_color = light) || ('i_color = cornflower))) && 
> ((('i_units = Box) || ('i_units = Pound)) && (('i_size = medium) || ('i_size 
> = extra large))) || (('i_manufact = 'i1.i_manufact) && ('i_category = 
> Women) && (('i_color = midnight) || ('i_color = snow))) && ((('i_units = 
> Pallet) || ('i_units = Gross)) && (('i_size = medium) || ('i_size = extra 
> large || ((('i_category = Women) && (('i_color = cyan) || ('i_color = 
> papaya))) && ((('i_units = Cup) || ('i_units = Dram)) && (('i_size = N/A) || 
> ('i_size = small) || 'i_category = Men) && (('i_color = orange) || 
> ('i_color = frosted))) && ((('i_units = Each) || ('i_units = Tbl)) && 
> (('i_size = petite) || ('i_size = large || ((('i_category = Men) && 
> (('i_color = forest) || ('i_color = ghost))) && ((('i_units = Lb) || 
> ('i_units = Bundle)) && (('i_size = medium) || ('i_size = extra large
>:   +- 'UnresolvedRelation `item`, None
>+- 'UnresolvedRelation `item`, Some(i1)
> == Analyzed Logical Plan ==
> i_product_name: string
> GlobalLimit 100
> +- LocalLimit 100
>+- Sort [i_product_name#24 ASC], true
>   +- Distinct
>  +- Project [i_product_name#24]
> +- Filter

[jira] [Updated] (SPARK-15154) LongHashedRelation test fails on Big Endian platform

2016-05-09 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins updated SPARK-15154:
-
Priority: Minor  (was: Major)
 Summary: LongHashedRelation test fails on Big Endian platform  (was: 
LongHashedRelation fails on Big Endian platform)

> LongHashedRelation test fails on Big Endian platform
> 
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>Priority: Minor
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29)
>   at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1526)
>   at 
> org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:29)
>   at org.scalatest.Suite$class.run(Suite.scala:1421)
>   at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:29)
>   at

[jira] [Commented] (SPARK-15154) LongHashedRelation fails on Big Endian platform

2016-05-09 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276552#comment-15276552
 ] 

Apache Spark commented on SPARK-15154:
--

User 'robbinspg' has created a pull request for this issue:
https://github.com/apache/spark/pull/13009

> LongHashedRelation fails on Big Endian platform
> ---
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29)
>   at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1526)
>   at 
> org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:29)
>   at org.scalatest.Suite$class.run(Suite.scala:1421)
>   at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:29)
>   at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
>   at 
>

[jira] [Assigned] (SPARK-15154) LongHashedRelation fails on Big Endian platform

2016-05-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15154:


Assignee: (was: Apache Spark)

> LongHashedRelation fails on Big Endian platform
> ---
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29)
>   at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1526)
>   at 
> org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:29)
>   at org.scalatest.Suite$class.run(Suite.scala:1421)
>   at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:29)
>   at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
>   at 
> org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
>   at 
>

[jira] [Assigned] (SPARK-15154) LongHashedRelation fails on Big Endian platform

2016-05-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15154:


Assignee: Apache Spark

> LongHashedRelation fails on Big Endian platform
> ---
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>Assignee: Apache Spark
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29)
>   at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1526)
>   at 
> org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:29)
>   at org.scalatest.Suite$class.run(Suite.scala:1421)
>   at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:29)
>   at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
>   at 
> org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
>   at 
>

[jira] [Commented] (SPARK-15154) LongHashedRelation fails on Big Endian platform

2016-05-09 Thread Pete Robbins (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276543#comment-15276543
 ] 

Pete Robbins commented on SPARK-15154:
--

I'm convinced the test is invalid. The creation of LongHashedRelation is 
guarded by

{code}
   if (key.length == 1 && key.head.dataType == LongType) {
  LongHashedRelation(input, key, sizeEstimate, mm)
}
{code}

In this failing test the key dataType is IntegerType

I'll submit a PR to fix the tests

> LongHashedRelation fails on Big Endian platform
> ---
>
> Key: SPARK-15154
> URL: https://issues.apache.org/jira/browse/SPARK-15154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Pete Robbins
>  Labels: big-endian
>
> NPE in 
> org.apache.spark.sql.execution.joins.HashedRelationSuite.LongToUnsafeRowMap
> Error Message
> java.lang.NullPointerException was thrown.
> Stacktrace
>   java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(HashedRelationSuite.scala:121)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply$mcV$sp(HashedRelationSuite.scala:119)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$3.apply(HashedRelationSuite.scala:112)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:29)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:29)
>   at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1526)
>   at 
> org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:29)
>   at

[jira] [Commented] (SPARK-14813) ML 2.0 QA: API: Python API coverage

2016-05-09 Thread Yanbo Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276497#comment-15276497
 ] 

Yanbo Liang commented on SPARK-14813:
-

[~holdenk] I'm not working on regression, please go ahead. I will open JIRAs if 
I start my work under this topic.

> ML 2.0 QA: API: Python API coverage
> ---
>
> Key: SPARK-14813
> URL: https://issues.apache.org/jira/browse/SPARK-14813
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML, PySpark
>Reporter: Joseph K. Bradley
>Assignee: holdenk
>
> For new public APIs added to MLlib, we need to check the generated HTML doc 
> and compare the Scala & Python versions.  We need to track:
> * Inconsistency: Do class/method/parameter names match?
> * Docs: Is the Python doc missing or just a stub?  We want the Python doc to 
> be as complete as the Scala doc.
> * API breaking changes: These should be very rare but are occasionally either 
> necessary (intentional) or accidental.  These must be recorded and added in 
> the Migration Guide for this release.
> ** Note: If the API change is for an Alpha/Experimental/DeveloperApi 
> component, please note that as well.
> * Missing classes/methods/parameters: We should create to-do JIRAs for 
> functionality missing from Python, to be added in the next release cycle.  
> Please use a *separate* JIRA (linked below as "requires") for this list of 
> to-do items.
> UPDATE: This only needs to cover spark.ml since spark.mllib is going into 
> maintenance mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14459) SQL partitioning must match existing tables, but is not checked.

2016-05-09 Thread Ryan Blue (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276493#comment-15276493
 ] 

Ryan Blue commented on SPARK-14459:
---

Thank you [~lian cheng]!

> SQL partitioning must match existing tables, but is not checked.
> 
>
> Key: SPARK-14459
> URL: https://issues.apache.org/jira/browse/SPARK-14459
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
> Fix For: 2.0.0
>
>
> Writing into partitioned Hive tables has unexpected results because the 
> table's partitioning is not detected and applied during the analysis phase. 
> For example, if I have two tables, {{source}} and {{partitioned}}, with the 
> same column types:
> {code}
> CREATE TABLE source (id bigint, data string, part string);
> CREATE TABLE partitioned (id bigint, data string) PARTITIONED BY (part 
> string);
> // copy from source to partitioned
> sqlContext.table("source").write.insertInto("partitioned")
> {code}
> Copying from {{source}} to {{partitioned}} succeeds, but results in 0 rows. 
> This works if I explicitly partition by adding 
> {{...write.partitionBy("part").insertInto(...)}}. This work-around isn't 
> obvious and is prone to error because the {{partitionBy}} must match the 
> table's partitioning, though it is not checked.
> I think when relations are resolved, the partitioning should be checked and 
> updated if it isn't set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15192) RowEncoder needs to verify nullability in a more explicit way

2016-05-09 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276478#comment-15276478
 ] 

Apache Spark commented on SPARK-15192:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/13008

> RowEncoder needs to verify nullability in a more explicit way
> -
>
> Key: SPARK-15192
> URL: https://issues.apache.org/jira/browse/SPARK-15192
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>
> When we create a Dataset from an RDD of rows with a specific schema, if the 
> nullability of a value does not match the nullability defined in the schema, 
> we will throw an exception that is not easy to understand. 
> It will be good to verify the nullability in a more explicit way.
> {code}
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.Row
> val schema = new StructType().add("a", StringType, false).add("b", 
> StringType, false)
> val rdd = sc.parallelize(Row(null, "123") :: Row("234", null) :: Nil)
> spark.createDataFrame(rdd, schema).show
> {code}
> {noformat}
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NullPointerException
> createexternalrow(if (isnull(input[0, string])) null else input[0, 
> string].toString, if (isnull(input[1, string])) null else input[1, 
> string].toString, StructField(a,StringType,false), 
> StructField(b,StringType,false))
> :- if (isnull(input[0, string])) null else input[0, string].toString
> :  :- isnull(input[0, string])
> :  :  +- input[0, string]
> :  :- null
> :  +- input[0, string].toString
> : +- input[0, string]
> +- if (isnull(input[1, string])) null else input[1, string].toString
>:- isnull(input[1, string])
>:  +- input[1, string]
>:- null
>+- input[1, string].toString
>   +- input[1, string]
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:244)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$13.apply(Dataset.scala:2119)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$13.apply(Dataset.scala:2119)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2119)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>   at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2407)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2118)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2125)
>   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1859)
>   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1858)
>   at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2437)
>   at org.apache.spark.sql.Dataset.head(Dataset.scala:1858)
>   at org.apache.spark.sql.Dataset.take(Dataset.scala:2075)
>   at org.apache.spark.sql.Dataset.showString(Dataset.scala:239)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:530)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:490)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:499)
>   ... 50 elided
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:241)
>   ... 72 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-15192) RowEncoder needs to verify nullability in a more explicit way

2016-05-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15192:


Assignee: (was: Apache Spark)

> RowEncoder needs to verify nullability in a more explicit way
> -
>
> Key: SPARK-15192
> URL: https://issues.apache.org/jira/browse/SPARK-15192
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>
> When we create a Dataset from an RDD of rows with a specific schema, if the 
> nullability of a value does not match the nullability defined in the schema, 
> we will throw an exception that is not easy to understand. 
> It will be good to verify the nullability in a more explicit way.
> {code}
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.Row
> val schema = new StructType().add("a", StringType, false).add("b", 
> StringType, false)
> val rdd = sc.parallelize(Row(null, "123") :: Row("234", null) :: Nil)
> spark.createDataFrame(rdd, schema).show
> {code}
> {noformat}
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NullPointerException
> createexternalrow(if (isnull(input[0, string])) null else input[0, 
> string].toString, if (isnull(input[1, string])) null else input[1, 
> string].toString, StructField(a,StringType,false), 
> StructField(b,StringType,false))
> :- if (isnull(input[0, string])) null else input[0, string].toString
> :  :- isnull(input[0, string])
> :  :  +- input[0, string]
> :  :- null
> :  +- input[0, string].toString
> : +- input[0, string]
> +- if (isnull(input[1, string])) null else input[1, string].toString
>:- isnull(input[1, string])
>:  +- input[1, string]
>:- null
>+- input[1, string].toString
>   +- input[1, string]
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:244)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$13.apply(Dataset.scala:2119)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$13.apply(Dataset.scala:2119)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2119)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>   at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2407)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2118)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2125)
>   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1859)
>   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1858)
>   at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2437)
>   at org.apache.spark.sql.Dataset.head(Dataset.scala:1858)
>   at org.apache.spark.sql.Dataset.take(Dataset.scala:2075)
>   at org.apache.spark.sql.Dataset.showString(Dataset.scala:239)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:530)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:490)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:499)
>   ... 50 elided
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:241)
>   ... 72 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-15192) RowEncoder needs to verify nullability in a more explicit way

2016-05-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15192:


Assignee: Apache Spark

> RowEncoder needs to verify nullability in a more explicit way
> -
>
> Key: SPARK-15192
> URL: https://issues.apache.org/jira/browse/SPARK-15192
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Apache Spark
>
> When we create a Dataset from an RDD of rows with a specific schema, if the 
> nullability of a value does not match the nullability defined in the schema, 
> we will throw an exception that is not easy to understand. 
> It will be good to verify the nullability in a more explicit way.
> {code}
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.Row
> val schema = new StructType().add("a", StringType, false).add("b", 
> StringType, false)
> val rdd = sc.parallelize(Row(null, "123") :: Row("234", null) :: Nil)
> spark.createDataFrame(rdd, schema).show
> {code}
> {noformat}
> java.lang.RuntimeException: Error while decoding: 
> java.lang.NullPointerException
> createexternalrow(if (isnull(input[0, string])) null else input[0, 
> string].toString, if (isnull(input[1, string])) null else input[1, 
> string].toString, StructField(a,StringType,false), 
> StructField(b,StringType,false))
> :- if (isnull(input[0, string])) null else input[0, string].toString
> :  :- isnull(input[0, string])
> :  :  +- input[0, string]
> :  :- null
> :  +- input[0, string].toString
> : +- input[0, string]
> +- if (isnull(input[1, string])) null else input[1, string].toString
>:- isnull(input[1, string])
>:  +- input[1, string]
>:- null
>+- input[1, string].toString
>   +- input[1, string]
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:244)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$13.apply(Dataset.scala:2119)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$13.apply(Dataset.scala:2119)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2119)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>   at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2407)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2118)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2125)
>   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1859)
>   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1858)
>   at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2437)
>   at org.apache.spark.sql.Dataset.head(Dataset.scala:1858)
>   at org.apache.spark.sql.Dataset.take(Dataset.scala:2075)
>   at org.apache.spark.sql.Dataset.showString(Dataset.scala:239)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:530)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:490)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:499)
>   ... 50 elided
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:241)
>   ... 72 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-15226) CSV file data-line with newline at first line load error

2016-05-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15226:


Assignee: Apache Spark

> CSV file data-line with newline at first line load error
> 
>
> Key: SPARK-15226
> URL: https://issues.apache.org/jira/browse/SPARK-15226
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Weichen Xu
>Assignee: Apache Spark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> CSV file such as:
> ---
> v1,v2,"v
> 3",v4,v5
> a,b,c,d,e
> ---
> it contains two row,first row :
> v1, v2, v\n3, v4, v5 (in value v\n3 it contains a newline character,it is 
> legal)
> second row:
> a,b,c,d,e
> then in spark-shell run commands like:
> val sqlContext = new org.apache.spark.sql.SQLContext(sc);
> var reader = sqlContext.read
> var df = reader.csv("path/to/csvfile")
> df.collect
> then we find the load data is wrong,
> the load data has only 3 columns, but in fact it has 5 columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-15226) CSV file data-line with newline at first line load error

2016-05-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15226:


Assignee: (was: Apache Spark)

> CSV file data-line with newline at first line load error
> 
>
> Key: SPARK-15226
> URL: https://issues.apache.org/jira/browse/SPARK-15226
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Weichen Xu
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> CSV file such as:
> ---
> v1,v2,"v
> 3",v4,v5
> a,b,c,d,e
> ---
> it contains two row,first row :
> v1, v2, v\n3, v4, v5 (in value v\n3 it contains a newline character,it is 
> legal)
> second row:
> a,b,c,d,e
> then in spark-shell run commands like:
> val sqlContext = new org.apache.spark.sql.SQLContext(sc);
> var reader = sqlContext.read
> var df = reader.csv("path/to/csvfile")
> df.collect
> then we find the load data is wrong,
> the load data has only 3 columns, but in fact it has 5 columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15226) CSV file data-line with newline at first line load error

2016-05-09 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276390#comment-15276390
 ] 

Apache Spark commented on SPARK-15226:
--

User 'WeichenXu123' has created a pull request for this issue:
https://github.com/apache/spark/pull/13007

> CSV file data-line with newline at first line load error
> 
>
> Key: SPARK-15226
> URL: https://issues.apache.org/jira/browse/SPARK-15226
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Weichen Xu
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> CSV file such as:
> ---
> v1,v2,"v
> 3",v4,v5
> a,b,c,d,e
> ---
> it contains two row,first row :
> v1, v2, v\n3, v4, v5 (in value v\n3 it contains a newline character,it is 
> legal)
> second row:
> a,b,c,d,e
> then in spark-shell run commands like:
> val sqlContext = new org.apache.spark.sql.SQLContext(sc);
> var reader = sqlContext.read
> var df = reader.csv("path/to/csvfile")
> df.collect
> then we find the load data is wrong,
> the load data has only 3 columns, but in fact it has 5 columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15226) CSV file data-line with newline at first line load error

2016-05-09 Thread Weichen Xu (JIRA)

Weichen Xu created SPARK-15226:
--

 Summary: CSV file data-line with newline at first line load error
 Key: SPARK-15226
 URL: https://issues.apache.org/jira/browse/SPARK-15226
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0, 2.1.0
Reporter: Weichen Xu


CSV file such as:
---
v1,v2,"v
3",v4,v5
a,b,c,d,e
---
it contains two row,first row :
v1, v2, v\n3, v4, v5 (in value v\n3 it contains a newline character,it is legal)
second row:
a,b,c,d,e

then in spark-shell run commands like:
val sqlContext = new org.apache.spark.sql.SQLContext(sc);
var reader = sqlContext.read
var df = reader.csv("path/to/csvfile")
df.collect

then we find the load data is wrong,
the load data has only 3 columns, but in fact it has 5 columns.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15188) PySpark NaiveBayes is missing Thresholds param

2016-05-09 Thread Nick Pentreath (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-15188:
---
Assignee: holdenk

> PySpark NaiveBayes is missing Thresholds param
> --
>
> Key: SPARK-15188
> URL: https://issues.apache.org/jira/browse/SPARK-15188
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Assignee: holdenk
>Priority: Trivial
>
> NaiveBayes in Python is missing thresholds param.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15188) PySpark NaiveBayes is missing Thresholds param

2016-05-09 Thread Nick Pentreath (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath updated SPARK-15188:
---
Summary: PySpark NaiveBayes is missing Thresholds param  (was: NaiveBayes 
is missing Thresholds param)

> PySpark NaiveBayes is missing Thresholds param
> --
>
> Key: SPARK-15188
> URL: https://issues.apache.org/jira/browse/SPARK-15188
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Trivial
>
> NaiveBayes in Python is missing thresholds param.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15212) CSV file reader when read file with first line schema do not filter blank in schema column name

2016-05-09 Thread Weichen Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276275#comment-15276275
 ] 

Weichen Xu commented on SPARK-15212:


en...but still may cause problem, for example, the csv file header contains ` 
character such as:
col`1,col2,...
so it is better to add a check whether the column name read from file is legal.

> CSV file reader when read file with first line schema do not filter blank in 
> schema column name
> ---
>
> Key: SPARK-15212
> URL: https://issues.apache.org/jira/browse/SPARK-15212
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Weichen Xu
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> for example, run the following code in spark-shell,
> val sqlContext = new org.apache.spark.sql.SQLContext(sc);
> var reader = sqlContext.read
> reader.option("header", true)
> var df = reader.csv("file:///diskext/tdata/spark/d1.csv")
> when the csv data file contains：
> --
> col1, col2,col3,col4,col5
> 1997,Ford,E350,"ac, abs, moon",3000.00
> 
> 
> the first line contains schema, the col2 has a blank before it,
> then the generated DataFrame's schema column name contains the blank.
> This may cause potential problem for example
> df.select("col2") 
> can't find the column, must use 
> df.select(" col2") 
> and if register the dataframe as a table, then do query, can't select col2.
> df.registerTempTable("tab1");
> sqlContext.sql("select col2 from tab1"); //will fail
> must add a column name validate when load csv file with schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276190#comment-15276190
 ] 

Apache Spark commented on SPARK-15223:
--

User 'ashishawasthi' has created a pull request for this issue:
https://github.com/apache/spark/pull/13004

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Priority: Trivial
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-15223:
--
Priority: Trivial  (was: Major)

Not Major, and too Trivial even for a JIRA

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Priority: Trivial
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-15225) Replace SQLContext with SparkSession in Encoder documentation

2016-05-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15225:


Assignee: (was: Apache Spark)

> Replace SQLContext with SparkSession in Encoder documentation
> -
>
> Key: SPARK-15225
> URL: https://issues.apache.org/jira/browse/SPARK-15225
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Liang-Chi Hsieh
>Priority: Minor
>
> Encoder's doc mentions sqlContext.implicits._. We should use 
> sparkSession.implicits._ instead now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-15225) Replace SQLContext with SparkSession in Encoder documentation

2016-05-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15225:


Assignee: Apache Spark

> Replace SQLContext with SparkSession in Encoder documentation
> -
>
> Key: SPARK-15225
> URL: https://issues.apache.org/jira/browse/SPARK-15225
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>Priority: Minor
>
> Encoder's doc mentions sqlContext.implicits._. We should use 
> sparkSession.implicits._ instead now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15225) Replace SQLContext with SparkSession in Encoder documentation

2016-05-09 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276159#comment-15276159
 ] 

Apache Spark commented on SPARK-15225:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/13002

> Replace SQLContext with SparkSession in Encoder documentation
> -
>
> Key: SPARK-15225
> URL: https://issues.apache.org/jira/browse/SPARK-15225
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Liang-Chi Hsieh
>Priority: Minor
>
> Encoder's doc mentions sqlContext.implicits._. We should use 
> sparkSession.implicits._ instead now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15225) Replace SQLContext with SparkSession in Encoder documentation

2016-05-09 Thread Liang-Chi Hsieh (JIRA)

Liang-Chi Hsieh created SPARK-15225:
---

 Summary: Replace SQLContext with SparkSession in Encoder 
documentation
 Key: SPARK-15225
 URL: https://issues.apache.org/jira/browse/SPARK-15225
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Liang-Chi Hsieh
Priority: Minor


Encoder's doc mentions sqlContext.implicits._. We should use 
sparkSession.implicits._ instead now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15223:


Assignee: (was: Apache Spark)

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276156#comment-15276156
 ] 

Apache Spark commented on SPARK-15223:
--

User 'philipphoffmann' has created a pull request for this issue:
https://github.com/apache/spark/pull/13001

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15223:


Assignee: Apache Spark

> spark.executor.logs.rolling.maxSize wrongly referred to as 
> spark.executor.logs.rolling.size.maxBytes
> 
>
> Key: SPARK-15223
> URL: https://issues.apache.org/jira/browse/SPARK-15223
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>Assignee: Apache Spark
>
> The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
> changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
> however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15224) Can not delete jar and list jar in spark Thrift server

2016-05-09 Thread poseidon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

poseidon updated SPARK-15224:
-
Description: 
when you try to delete jar , and exec delete jar  or list jar in you 
beeline client. it throws exception
delete jar; 
Error: org.apache.spark.sql.AnalysisException: line 1:7 missing FROM at 'jars' 
near 'jars'
line 1:12 missing EOF at 'myudfs' near 'jars'; (state=,code=0)
list jar;
Error: org.apache.spark.sql.AnalysisException: cannot recognize input near 
'list' 'jars' ''; line 1 pos 0 (state=,code=0)

{code:title=funnlog.log|borderStyle=solid}
16/05/09 17:26:52 INFO thriftserver.SparkExecuteStatementOperation: Running 
query 'list jar' with 1da09765-efb4-42dc-8890-3defca40f89d
16/05/09 17:26:52 INFO parse.ParseDriver: Parsing command: list jar
NoViableAltException(26@[])
at 
org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1071)
at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.spark.sql.hive.HiveQl$.getAst(HiveQl.scala:276)
at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:303)
at 
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41)
at 
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:40)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
at 
scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at 
scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890)
at 
scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
at 
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34)
at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:295)
at 
org.apache.spark.sql.hive.HiveQLDialect$$anonfun$parse$1.apply(HiveContext.scala:66)
at 
org.apache.spark.sql.hive.HiveQLDialect$$anonfun$parse$1.apply(HiveContext.scala:66)
at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:293)
at 
org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:240)
at 
org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:239)
at 
org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:282)
at org.apache.spark.sql.hive.HiveQLDialect.parse(HiveContext.scala:65)
at 
org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211)
at 
org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211)
at 
org.apache.spark.sql.execution.SparkSQLParser$$anonfun$org$apache$spark$sql$execution$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114)
at 
org.apache.spark.sql.execution.SparkSQLParser$$anonfun$org$apache$spark$sql$execution$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:113)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
at

[jira] [Created] (SPARK-15223) spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

2016-05-09 Thread Philipp Hoffmann (JIRA)

Philipp Hoffmann created SPARK-15223:


 Summary: spark.executor.logs.rolling.maxSize wrongly referred to 
as spark.executor.logs.rolling.size.maxBytes
 Key: SPARK-15223
 URL: https://issues.apache.org/jira/browse/SPARK-15223
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 1.6.1
Reporter: Philipp Hoffmann


The configuration setting {{spark.executor.logs.rolling.size.maxBytes}} was 
changed to {{spark.executor.logs.rolling.maxSize}} in 1.4 or so. There is 
however still a reference in the documentation using the old name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15224) Can not delete jar and list jar in spark Thrift server

2016-05-09 Thread poseidon (JIRA)

poseidon created SPARK-15224:


 Summary: Can not delete jar and list jar in spark Thrift server
 Key: SPARK-15224
 URL: https://issues.apache.org/jira/browse/SPARK-15224
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.1
 Environment: spark 1.6.1
hive 1.2.1 
hdfs 2.7.1 
Reporter: poseidon


when you try to delete jar , and exec delete jar  or list jar in you 
beeline client. it throws exception
delete jar; 
Error: org.apache.spark.sql.AnalysisException: line 1:7 missing FROM at 'jars' 
near 'jars'
line 1:12 missing EOF at 'myudfs' near 'jars'; (state=,code=0)
list jar;
Error: org.apache.spark.sql.AnalysisException: cannot recognize input near 
'list' 'jars' ''; line 1 pos 0 (state=,code=0)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15192) RowEncoder needs to verify nullability in a more explicit way

2016-05-09 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-15192:
---
Description: 
When we create a Dataset from an RDD of rows with a specific schema, if the 
nullability of a value does not match the nullability defined in the schema, we 
will throw an exception that is not easy to understand. 
It will be good to verify the nullability in a more explicit way.

{code}
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

val schema = new StructType().add("a", StringType, false).add("b", StringType, 
false)
val rdd = sc.parallelize(Row(null, "123") :: Row("234", null) :: Nil)
spark.createDataFrame(rdd, schema).show
{code}

{noformat}
java.lang.RuntimeException: Error while decoding: java.lang.NullPointerException
createexternalrow(if (isnull(input[0, string])) null else input[0, 
string].toString, if (isnull(input[1, string])) null else input[1, 
string].toString, StructField(a,StringType,false), 
StructField(b,StringType,false))
:- if (isnull(input[0, string])) null else input[0, string].toString
:  :- isnull(input[0, string])
:  :  +- input[0, string]
:  :- null
:  +- input[0, string].toString
: +- input[0, string]
+- if (isnull(input[1, string])) null else input[1, string].toString
   :- isnull(input[1, string])
   :  +- input[1, string]
   :- null
   +- input[1, string].toString
  +- input[1, string]

  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:244)
  at 
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$13.apply(Dataset.scala:2119)
  at 
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1$$anonfun$apply$13.apply(Dataset.scala:2119)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
  at 
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2119)
  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
  at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2407)
  at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2118)
  at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2125)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1859)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1858)
  at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2437)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:1858)
  at org.apache.spark.sql.Dataset.take(Dataset.scala:2075)
  at org.apache.spark.sql.Dataset.showString(Dataset.scala:239)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:530)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:490)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:499)
  ... 50 elided
Caused by: java.lang.NullPointerException
  at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
 Source)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.fromRow(ExpressionEncoder.scala:241)
  ... 72 more
{noformat}

  was:
When we create a Dataset from an RDD of rows with a specific schema, if the 
nullability of a value does not match the nullability defined in the schema, we 
will throw an exception that is not easy to understand. 
It will be good to verify the nullability in a more explicit way.

{code}
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

val schema = new StructType().add("a", StringType, false).add("b", StringType, 
false)
val rdd = sc.parallelize(Row(null, "123") :: Row("234", null) :: Nil)
spark.createDataFrame(rdd, schema).show

java.lang.RuntimeException: Error while decoding: java.lang.NullPointerException
createexternalrow(if (isnull(input[0, string])) null else input[0, 
string].toString, if (isnull(input[1, string])) null else input[1, 
string].toString, StructField(a,StringType,false), 
StructField(b,StringType,false))
:- if (isnull(input[0, string])) null else input[0, string].toString
:  :- isnull(input[0, string])
:  :  +- input[0, string]
:  :- null
:  +- input[0, string].toString
: +- input[0, string]
+- if (isnull(input[1, string])) null else input[1, string].toString
   :- isnull(input[1, string])
   :  +- input[1, string]
   :- null
   +- input[1, string].toString
  +- input[1, string]

  at

[jira] [Updated] (SPARK-14459) SQL partitioning must match existing tables, but is not checked.

2016-05-09 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14459:
---
Assignee: Ryan Blue

> SQL partitioning must match existing tables, but is not checked.
> 
>
> Key: SPARK-14459
> URL: https://issues.apache.org/jira/browse/SPARK-14459
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
> Fix For: 2.0.0
>
>
> Writing into partitioned Hive tables has unexpected results because the 
> table's partitioning is not detected and applied during the analysis phase. 
> For example, if I have two tables, {{source}} and {{partitioned}}, with the 
> same column types:
> {code}
> CREATE TABLE source (id bigint, data string, part string);
> CREATE TABLE partitioned (id bigint, data string) PARTITIONED BY (part 
> string);
> // copy from source to partitioned
> sqlContext.table("source").write.insertInto("partitioned")
> {code}
> Copying from {{source}} to {{partitioned}} succeeds, but results in 0 rows. 
> This works if I explicitly partition by adding 
> {{...write.partitionBy("part").insertInto(...)}}. This work-around isn't 
> obvious and is prone to error because the {{partitionBy}} must match the 
> table's partitioning, though it is not checked.
> I think when relations are resolved, the partitioning should be checked and 
> updated if it isn't set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14459) SQL partitioning must match existing tables, but is not checked.

2016-05-09 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-14459.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12239
[https://github.com/apache/spark/pull/12239]

> SQL partitioning must match existing tables, but is not checked.
> 
>
> Key: SPARK-14459
> URL: https://issues.apache.org/jira/browse/SPARK-14459
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
> Fix For: 2.0.0
>
>
> Writing into partitioned Hive tables has unexpected results because the 
> table's partitioning is not detected and applied during the analysis phase. 
> For example, if I have two tables, {{source}} and {{partitioned}}, with the 
> same column types:
> {code}
> CREATE TABLE source (id bigint, data string, part string);
> CREATE TABLE partitioned (id bigint, data string) PARTITIONED BY (part 
> string);
> // copy from source to partitioned
> sqlContext.table("source").write.insertInto("partitioned")
> {code}
> Copying from {{source}} to {{partitioned}} succeeds, but results in 0 rows. 
> This works if I explicitly partition by adding 
> {{...write.partitionBy("part").insertInto(...)}}. This work-around isn't 
> obvious and is prone to error because the {{partitionBy}} must match the 
> table's partitioning, though it is not checked.
> I think when relations are resolved, the partitioning should be checked and 
> updated if it isn't set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14459) SQL partitioning must match existing tables, but is not checked.

2016-05-09 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-14459:
---
Affects Version/s: 2.0.0
 Target Version/s: 2.0.0

> SQL partitioning must match existing tables, but is not checked.
> 
>
> Key: SPARK-14459
> URL: https://issues.apache.org/jira/browse/SPARK-14459
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
> Fix For: 2.0.0
>
>
> Writing into partitioned Hive tables has unexpected results because the 
> table's partitioning is not detected and applied during the analysis phase. 
> For example, if I have two tables, {{source}} and {{partitioned}}, with the 
> same column types:
> {code}
> CREATE TABLE source (id bigint, data string, part string);
> CREATE TABLE partitioned (id bigint, data string) PARTITIONED BY (part 
> string);
> // copy from source to partitioned
> sqlContext.table("source").write.insertInto("partitioned")
> {code}
> Copying from {{source}} to {{partitioned}} succeeds, but results in 0 rows. 
> This works if I explicitly partition by adding 
> {{...write.partitionBy("part").insertInto(...)}}. This work-around isn't 
> obvious and is prone to error because the {{partitionBy}} must match the 
> table's partitioning, though it is not checked.
> I think when relations are resolved, the partitioning should be checked and 
> updated if it isn't set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15218) Error: Could not find or load main class org.apache.spark.launcher.Main when run from a directory containing colon ':'

2016-05-09 Thread Viacheslav Saevskiy (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276129#comment-15276129
 ] 

Viacheslav Saevskiy commented on SPARK-15218:
-

it's a java specific issue and I didn't find a solution to escape ':' in 
classpath.

What mesos version do you use? It's possible what this issue was fixed on mesos 
version 0.18 
https://issues.apache.org/jira/browse/MESOS-1128

> Error: Could not find or load main class org.apache.spark.launcher.Main when 
> run from a directory containing colon ':'
> --
>
> Key: SPARK-15218
> URL: https://issues.apache.org/jira/browse/SPARK-15218
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell, Spark Submit
>Affects Versions: 1.6.1
>Reporter: Adam Cecile
>  Labels: mesos
>
> {noformat}
> mkdir /tmp/qwe:rtz
> cd /tmp/qwe:rtz
> wget 
> http://www-eu.apache.org/dist/spark/spark-1.6.1/spark-1.6.1-bin-without-hadoop.tgz
> tar xvzf spark-1.6.1-bin-without-hadoop.tgz 
> cd spark-1.6.1-bin-without-hadoop/
> bin/spark-submit
> {noformat}
> Returns "Error: Could not find or load main class 
> org.apache.spark.launcher.Main".
> That would not be such an issue if Mesos executor did not have colon in the 
> generated paths. It means withtout hacking (define relative SPARK_HOME path 
> by myself) there's no way to run a spark-job insode a mesos job container...
> Best regards, Adam.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-15172) Warning message should explicitly tell user initial coefficients is ignored if its size doesn't match expected size in LogisticRegression

2016-05-09 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-15172.
---
   Resolution: Fixed
Fix Version/s: 2.1.0

Issue resolved by pull request 12948
[https://github.com/apache/spark/pull/12948]

> Warning message should explicitly tell user initial coefficients is ignored 
> if its size doesn't match expected size in LogisticRegression
> -
>
> Key: SPARK-15172
> URL: https://issues.apache.org/jira/browse/SPARK-15172
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: ding
>Assignee: ding
>Priority: Trivial
> Fix For: 2.1.0
>
>
> From ML/LogisticRegression code logic, if size of initial coefficients 
> doesn't match expected size, initial coefficients value will be ignored. We 
> should explicitly tell user the information. Besides, log size of initial 
> coefficients should be more straightforward than log initial coefficients 
> value when size mismatch happened.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-15222) SparkR ML examples update in 2.0

2016-05-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15222:


Assignee: (was: Apache Spark)

> SparkR ML examples update in 2.0
> 
>
> Key: SPARK-15222
> URL: https://issues.apache.org/jira/browse/SPARK-15222
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Reporter: Yanbo Liang
>Priority: Minor
>
> Update example code in examples/src/main/r/ml.R to reflect the new algorithms.
> * spark.glm and glm
> * spark.survreg
> * spark.naiveBayes
> * spark.kmeans



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15222) SparkR ML examples update in 2.0

2016-05-09 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276114#comment-15276114
 ] 

Apache Spark commented on SPARK-15222:
--

User 'yanboliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/13000

> SparkR ML examples update in 2.0
> 
>
> Key: SPARK-15222
> URL: https://issues.apache.org/jira/browse/SPARK-15222
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Reporter: Yanbo Liang
>Priority: Minor
>
> Update example code in examples/src/main/r/ml.R to reflect the new algorithms.
> * spark.glm and glm
> * spark.survreg
> * spark.naiveBayes
> * spark.kmeans



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-15222) SparkR ML examples update in 2.0

2016-05-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15222:


Assignee: Apache Spark

> SparkR ML examples update in 2.0
> 
>
> Key: SPARK-15222
> URL: https://issues.apache.org/jira/browse/SPARK-15222
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Reporter: Yanbo Liang
>Assignee: Apache Spark
>Priority: Minor
>
> Update example code in examples/src/main/r/ml.R to reflect the new algorithms.
> * spark.glm and glm
> * spark.survreg
> * spark.naiveBayes
> * spark.kmeans



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2016-05-09 Thread Xin Hao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276109#comment-15276109
 ] 

Xin Hao edited comment on SPARK-4452 at 5/9/16 8:36 AM:


Since this is an old issue which impact Spark since 1.1.0, can the patch be 
merged to Spark 1.6.X ? This will be very helpful for Spark 1.6.X users. Thanks.


was (Author: xhao1):
Since this is an old issue which impact Spark since 1.1.0, can the patch be 
merged to Spark 1.6.X ? Thanks.

> Shuffle data structures can starve others on the same thread for memory 
> 
>
> Key: SPARK-4452
> URL: https://issues.apache.org/jira/browse/SPARK-4452
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Tianshuo Deng
>Assignee: Lianhui Wang
> Fix For: 2.0.0
>
>
> When an Aggregator is used with ExternalSorter in a task, spark will create 
> many small files and could cause too many files open error during merging.
> Currently, ShuffleMemoryManager does not work well when there are 2 spillable 
> objects in a thread, which are ExternalSorter and ExternalAppendOnlyMap(used 
> by Aggregator) in this case. Here is an example: Due to the usage of mapside 
> aggregation, ExternalAppendOnlyMap is created first to read the RDD. It may 
> ask as much memory as it can, which is totalMem/numberOfThreads. Then later 
> on when ExternalSorter is created in the same thread, the 
> ShuffleMemoryManager could refuse to allocate more memory to it, since the 
> memory is already given to the previous requested 
> object(ExternalAppendOnlyMap). That causes the ExternalSorter keeps spilling 
> small files(due to the lack of memory)
> I'm currently working on a PR to address these two issues. It will include 
> following changes:
> 1. The ShuffleMemoryManager should not only track the memory usage for each 
> thread, but also the object who holds the memory
> 2. The ShuffleMemoryManager should be able to trigger the spilling of a 
> spillable object. In this way, if a new object in a thread is requesting 
> memory, the old occupant could be evicted/spilled. Previously the spillable 
> objects trigger spilling by themselves. So one may not trigger spilling even 
> if another object in the same thread needs more memory. After this change The 
> ShuffleMemoryManager could trigger the spilling of an object if it needs to.
> 3. Make the iterator of ExternalAppendOnlyMap spillable. Previously 
> ExternalAppendOnlyMap returns an destructive iterator and can not be spilled 
> after the iterator is returned. This should be changed so that even after the 
> iterator is returned, the ShuffleMemoryManager can still spill it.
> Currently, I have a working branch in progress: 
> https://github.com/tsdeng/spark/tree/enhance_memory_manager. Already made 
> change 3 and have a prototype of change 1 and 2 to evict spillable from 
> memory manager, still in progress. I will send a PR when it's done.
> Any feedback or thoughts on this change is highly appreciated !



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2016-05-09 Thread Xin Hao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276109#comment-15276109
 ] 

Xin Hao commented on SPARK-4452:


Since this is an old issue which impact Spark since 1.1.0, can the patch be 
merged to Spark 1.6.X ? Thanks.

> Shuffle data structures can starve others on the same thread for memory 
> 
>
> Key: SPARK-4452
> URL: https://issues.apache.org/jira/browse/SPARK-4452
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Tianshuo Deng
>Assignee: Lianhui Wang
> Fix For: 2.0.0
>
>
> When an Aggregator is used with ExternalSorter in a task, spark will create 
> many small files and could cause too many files open error during merging.
> Currently, ShuffleMemoryManager does not work well when there are 2 spillable 
> objects in a thread, which are ExternalSorter and ExternalAppendOnlyMap(used 
> by Aggregator) in this case. Here is an example: Due to the usage of mapside 
> aggregation, ExternalAppendOnlyMap is created first to read the RDD. It may 
> ask as much memory as it can, which is totalMem/numberOfThreads. Then later 
> on when ExternalSorter is created in the same thread, the 
> ShuffleMemoryManager could refuse to allocate more memory to it, since the 
> memory is already given to the previous requested 
> object(ExternalAppendOnlyMap). That causes the ExternalSorter keeps spilling 
> small files(due to the lack of memory)
> I'm currently working on a PR to address these two issues. It will include 
> following changes:
> 1. The ShuffleMemoryManager should not only track the memory usage for each 
> thread, but also the object who holds the memory
> 2. The ShuffleMemoryManager should be able to trigger the spilling of a 
> spillable object. In this way, if a new object in a thread is requesting 
> memory, the old occupant could be evicted/spilled. Previously the spillable 
> objects trigger spilling by themselves. So one may not trigger spilling even 
> if another object in the same thread needs more memory. After this change The 
> ShuffleMemoryManager could trigger the spilling of an object if it needs to.
> 3. Make the iterator of ExternalAppendOnlyMap spillable. Previously 
> ExternalAppendOnlyMap returns an destructive iterator and can not be spilled 
> after the iterator is returned. This should be changed so that even after the 
> iterator is returned, the ShuffleMemoryManager can still spill it.
> Currently, I have a working branch in progress: 
> https://github.com/tsdeng/spark/tree/enhance_memory_manager. Already made 
> change 3 and have a prototype of change 1 and 2 to evict spillable from 
> memory manager, still in progress. I will send a PR when it's done.
> Any feedback or thoughts on this change is highly appreciated !



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15222) SparkR ML examples update in 2.0

2016-05-09 Thread Yanbo Liang (JIRA)

Yanbo Liang created SPARK-15222:
---

 Summary: SparkR ML examples update in 2.0
 Key: SPARK-15222
 URL: https://issues.apache.org/jira/browse/SPARK-15222
 Project: Spark
  Issue Type: Improvement
  Components: ML, SparkR
Reporter: Yanbo Liang
Priority: Minor


Update example code in examples/src/main/r/ml.R to reflect the new algorithms.
* spark.glm and glm
* spark.survreg
* spark.naiveBayes
* spark.kmeans



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15159) Remove usage of HiveContext in SparkR.

2016-05-09 Thread Vijay Parmar (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276100#comment-15276100
 ] 

Vijay Parmar commented on SPARK-15159:
--

1. I looked at the source code on  
"https://github.com/apache/spark/blob/master/R/pkg/R/SQLContext.R; 
and found that on lines 193 and 194 there is "sparkRHivesc" other than this 
there is nowhere mention of it in the code.

I was bit confused whether this is the only change that needs to be done or 
something else also?

2. I didn't felt the need of any change in unit tests of SparkR.

Please let know you opinion or suggestion(s). So that I can proceed further on 
this.

> Remove usage of HiveContext in SparkR.
> --
>
> Key: SPARK-15159
> URL: https://issues.apache.org/jira/browse/SPARK-15159
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Sun Rui
>
> HiveContext is to be deprecated in 2.0.  Replace them with 
> SparkSession.withHiveSupport in SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14898) MultivariateGaussian could use Cholesky in calculateCovarianceConstants

2016-05-09 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276090#comment-15276090
 ] 

Sean Owen commented on SPARK-14898:
---

I don't think the comment means that the SVD is used. It's noting that it could 
be used, but in this case, the desired product reduces to a simpler operation, 
an eigendecomposition. [~josephkb] I think this is perhaps out of date?

> MultivariateGaussian could use Cholesky in calculateCovarianceConstants
> ---
>
> Key: SPARK-14898
> URL: https://issues.apache.org/jira/browse/SPARK-14898
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> In spark.ml.stat.distribution.MultivariateGaussian, 
> calculateCovarianceConstants uses SVD.  It might be more efficient to use 
> Cholesky.  We should check other numerical libraries and see if we should 
> switch to Cholesky.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-15136) Linkify ML PyDoc

2016-05-09 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-15136.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12918
[https://github.com/apache/spark/pull/12918]

> Linkify ML PyDoc
> 
>
> Key: SPARK-15136
> URL: https://issues.apache.org/jira/browse/SPARK-15136
> Project: Spark
>  Issue Type: Improvement
>Reporter: holdenk
>Priority: Minor
> Fix For: 2.0.0
>
>
> PyDoc links in ml are in non-standard format. Switch to standard sphinx link 
> format for better formatted documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15136) Linkify ML PyDoc

2016-05-09 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-15136:
--
Assignee: holdenk

> Linkify ML PyDoc
> 
>
> Key: SPARK-15136
> URL: https://issues.apache.org/jira/browse/SPARK-15136
> Project: Spark
>  Issue Type: Improvement
>Reporter: holdenk
>Assignee: holdenk
>Priority: Minor
> Fix For: 2.0.0
>
>
> PyDoc links in ml are in non-standard format. Switch to standard sphinx link 
> format for better formatted documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-15221) error: not found: value sqlContext when starting Spark 1.6.1

2016-05-09 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-15221.
---
  Resolution: Not A Problem
Target Version/s:   (was: 1.6.1)

Exactly, this is the problem:

Caused by: java.sql.SQLException: Directory /home/metastore_db cannot be 
created.

You probably don't have permission to create that dir, but it's also probably 
not where you meant it to be created. You'd have to determine why you're trying 
to write into /home, but that's not a Spark issue per se.

Please read 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark first 

> error: not found: value sqlContext when starting Spark 1.6.1
> 
>
> Key: SPARK-15221
> URL: https://issues.apache.org/jira/browse/SPARK-15221
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.0.4, 8 GB RAM, 1 Processor
>Reporter: Vijay Parmar
>Priority: Blocker
>  Labels: build, newbie
>
> When I start Spark (version 1.6.1), at the very end I am getting the 
> following error message:
> :16: error: not found: value sqlContext
>  import sqlContext.implicits._
> ^
> :16: error: not found: value sqlContext
>  import sqlContext.sql
> I have gone through some content on the web about editing the /.bashrc file 
> and including the "SPARK_LOCAL_IP=127.0.0.1" under SPARK variables. 
> Also tried editing the /etc/hosts file with :-
>  $ sudo vi /etc/hosts
>  ...
>  127.0.0.1  
>  ...
> but still the issue persists.  Is it the issue with the build or something 
> else?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13064) api/v1/application/jobs/attempt lacks "attempId" field for spark-shell

2016-05-09 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13064.
---
Resolution: Won't Fix

> api/v1/application/jobs/attempt lacks "attempId" field for spark-shell
> --
>
> Key: SPARK-13064
> URL: https://issues.apache.org/jira/browse/SPARK-13064
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Reporter: Zhuo Liu
>Priority: Minor
>
> For any application launches with spark-shell will not have attemptId field 
> in their rest API. From the REST API point of view, we might want to force an 
> Id for it, i.e., "1".
> {code}
> {
>   "id" : "application_1453789230389_377545",
>   "name" : "PySparkShell",
>   "attempts" : [ {
> "startTime" : "2016-01-28T02:17:11.035GMT",
> "endTime" : "2016-01-28T02:30:01.355GMT",
> "lastUpdated" : "2016-01-28T02:30:01.516GMT",
> "duration" : 770320,
> "sparkUser" : "huyng",
> "completed" : true
>   } ]
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15221) error: not found: value sqlContext when starting Spark 1.6.1

2016-05-09 Thread Vijay Parmar (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276071#comment-15276071
 ] 

Vijay Parmar commented on SPARK-15221:
--

Well it's a whole lot that gets generated before the one I posted but still 
sharing some of the part :-

Caused by: java.sql.SQLException: Failed to create database 'metastore_db', see 
the next exception for details.
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.createDatabase(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedConnection.(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection40.(Unknown Source)
at org.apache.derby.jdbc.Driver40.getNewEmbedConnection(Unknown Source)
at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
at org.apache.derby.jdbc.Driver20.connect(Unknown Source)
at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at 
org.apache.commons.dbcp.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:78)
at 
org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582)
at 
org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1148)
at 
org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:501)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:298)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at 
org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
... 104 more
Caused by: java.sql.SQLException: Failed to create database 'metastore_db', see 
the next exception for details.
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
 Source)
... 131 more
Caused by: java.sql.SQLException: Directory /home/metastore_db cannot be 
created.
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
 Source)
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
Source)
... 128 more
Caused by: ERROR XBM0H: Directory /home/metastore_db cannot be created.
at org.apache.derby.iapi.error.StandardException.newException(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.StorageFactoryService$10.run(Unknown 
Source)
at java.security.AccessController.doPrivileged(Native Method)
at 
org.apache.derby.impl.services.monitor.StorageFactoryService.createServiceRoot(Unknown
 Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.createPersistentService(Unknown
 Source)
at 
org.apache.derby.iapi.services.monitor.Monitor.createPersistentService(Unknown 
Source)

:16: error: not found: value sqlContext
import sqlContext.implicits._
^
:16: error: not found: value sqlContext
import sqlContext.sql

> error: not found: value sqlContext when starting Spark 1.6.1
>

[jira] [Commented] (SPARK-15221) error: not found: value sqlContext when starting Spark 1.6.1

2016-05-09 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276064#comment-15276064
 ] 

Sean Owen commented on SPARK-15221:
---

This virtually always happens when an earlier error occurred. Check farther up 
the console output.

> error: not found: value sqlContext when starting Spark 1.6.1
> 
>
> Key: SPARK-15221
> URL: https://issues.apache.org/jira/browse/SPARK-15221
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.0.4, 8 GB RAM, 1 Processor
>Reporter: Vijay Parmar
>Priority: Blocker
>  Labels: build, newbie
>
> When I start Spark (version 1.6.1), at the very end I am getting the 
> following error message:
> :16: error: not found: value sqlContext
>  import sqlContext.implicits._
> ^
> :16: error: not found: value sqlContext
>  import sqlContext.sql
> I have gone through some content on the web about editing the /.bashrc file 
> and including the "SPARK_LOCAL_IP=127.0.0.1" under SPARK variables. 
> Also tried editing the /etc/hosts file with :-
>  $ sudo vi /etc/hosts
>  ...
>  127.0.0.1  
>  ...
> but still the issue persists.  Is it the issue with the build or something 
> else?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15221) error: not found: value sqlContext when starting Spark 1.6.1

2016-05-09 Thread Vijay Parmar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Parmar updated SPARK-15221:
-
Priority: Blocker  (was: Minor)

> error: not found: value sqlContext when starting Spark 1.6.1
> 
>
> Key: SPARK-15221
> URL: https://issues.apache.org/jira/browse/SPARK-15221
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.0.4, 8 GB RAM, 1 Processor
>Reporter: Vijay Parmar
>Priority: Blocker
>  Labels: build, newbie
>
> When I start Spark (version 1.6.1), at the very end I am getting the 
> following error message:
> :16: error: not found: value sqlContext
>  import sqlContext.implicits._
> ^
> :16: error: not found: value sqlContext
>  import sqlContext.sql
> I have gone through some content on the web about editing the /.bashrc file 
> and including the "SPARK_LOCAL_IP=127.0.0.1" under SPARK variables. 
> Also tried editing the /etc/hosts file with :-
>  $ sudo vi /etc/hosts
>  ...
>  127.0.0.1  
>  ...
> but still the issue persists.  Is it the issue with the build or something 
> else?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15221) error: not found: value sqlContext when starting Spark 1.6.1

2016-05-09 Thread Vijay Parmar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Parmar updated SPARK-15221:
-
Description: 
When I start Spark (version 1.6.1), at the very end I am getting the following 
error message:

:16: error: not found: value sqlContext
 import sqlContext.implicits._
^
:16: error: not found: value sqlContext
 import sqlContext.sql

I have gone through some content on the web about editing the /.bashrc file and 
including the "SPARK_LOCAL_IP=127.0.0.1" under SPARK variables. 

Also tried editing the /etc/hosts file with :-

 $ sudo vi /etc/hosts
 ...
 127.0.0.1  
 ...

but still the issue persists.  Is it the issue with the build or something else?

  was:
When I start Spark (version 1.6.1), at the very end I am getting the following 
error message:

 :16: error: not found: value sqlContext
 import sqlContext.implicits._
^
:16: error: not found: value sqlContext
 import sqlContext.sql 

I have gone through some content on the web about editing the /.bashrc file and 
including the "SPARK_LOCAL_IP=127.0.0.1" under SPARK variables. 

Also tried editing the /etc/hosts file with :-

 $ sudo vi /etc/hosts
 ...
 127.0.0.1  
 ...

but still the issue persists.  Is it the issue with the build or something else?


> error: not found: value sqlContext when starting Spark 1.6.1
> 
>
> Key: SPARK-15221
> URL: https://issues.apache.org/jira/browse/SPARK-15221
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.0.4, 8 GB RAM, 1 Processor
>Reporter: Vijay Parmar
>Priority: Minor
>  Labels: build, newbie
>
> When I start Spark (version 1.6.1), at the very end I am getting the 
> following error message:
> :16: error: not found: value sqlContext
>  import sqlContext.implicits._
> ^
> :16: error: not found: value sqlContext
>  import sqlContext.sql
> I have gone through some content on the web about editing the /.bashrc file 
> and including the "SPARK_LOCAL_IP=127.0.0.1" under SPARK variables. 
> Also tried editing the /etc/hosts file with :-
>  $ sudo vi /etc/hosts
>  ...
>  127.0.0.1  
>  ...
> but still the issue persists.  Is it the issue with the build or something 
> else?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15221) error: not found: value sqlContext when starting Spark 1.6.1

2016-05-09 Thread Vijay Parmar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Parmar updated SPARK-15221:
-
Description: 
When I start Spark (version 1.6.1), at the very end I am getting the following 
error message:

 :16: error: not found: value sqlContext
 import sqlContext.implicits._
^
:16: error: not found: value sqlContext
 import sqlContext.sql 

I have gone through some content on the web about editing the /.bashrc file and 
including the "SPARK_LOCAL_IP=127.0.0.1" under SPARK variables. 

Also tried editing the /etc/hosts file with :-

 $ sudo vi /etc/hosts
 ...
 127.0.0.1  
 ...

but still the issue persists.  Is it the issue with the build or something else?

  was:
When I start Spark (version 1.6.1), at the very end I am getting the following 
error message:

:16: error: not found: value sqlContext
 import sqlContext.implicits._
^
:16: error: not found: value sqlContext
 import sqlContext.sql

I have gone through some content on the web about editing the /.bashrc file and 
including the "SPARK_LOCAL_IP=127.0.0.1" under SPARK variables. 

Also tried editing the /etc/hosts file with :-

 $ sudo vi /etc/hosts
 ...
 127.0.0.1  
 ...

but still the issue persists.  Is it the issue with the build or something else?


> error: not found: value sqlContext when starting Spark 1.6.1
> 
>
> Key: SPARK-15221
> URL: https://issues.apache.org/jira/browse/SPARK-15221
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.0.4, 8 GB RAM, 1 Processor
>Reporter: Vijay Parmar
>Priority: Minor
>  Labels: build, newbie
>
> When I start Spark (version 1.6.1), at the very end I am getting the 
> following error message:
>  :16: error: not found: value sqlContext
>  import sqlContext.implicits._
> ^
> :16: error: not found: value sqlContext
>  import sqlContext.sql 
> I have gone through some content on the web about editing the /.bashrc file 
> and including the "SPARK_LOCAL_IP=127.0.0.1" under SPARK variables. 
> Also tried editing the /etc/hosts file with :-
>  $ sudo vi /etc/hosts
>  ...
>  127.0.0.1  
>  ...
> but still the issue persists.  Is it the issue with the build or something 
> else?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15221) error: not found: value sqlContext when starting Spark 1.6.1

2016-05-09 Thread Vijay Parmar (JIRA)

Vijay Parmar created SPARK-15221:


 Summary: error: not found: value sqlContext when starting Spark 
1.6.1
 Key: SPARK-15221
 URL: https://issues.apache.org/jira/browse/SPARK-15221
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.6.1
 Environment: Ubuntu 14.0.4, 8 GB RAM, 1 Processor
Reporter: Vijay Parmar
Priority: Minor


When I start Spark (version 1.6.1), at the very end I am getting the following 
error message:

:16: error: not found: value sqlContext
 import sqlContext.implicits._
^
:16: error: not found: value sqlContext
 import sqlContext.sql

I have gone through some content on the web about editing the /.bashrc file and 
including the "SPARK_LOCAL_IP=127.0.0.1" under SPARK variables. 

Also tried editing the /etc/hosts file with :-

 $ sudo vi /etc/hosts
 ...
 127.0.0.1  
 ...

but still the issue persists.  Is it the issue with the build or something else?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15219) [Spark SQL] it don't support to detect runtime temporary table for enabling broadcast hash join optimization

2016-05-09 Thread Yi Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276055#comment-15276055
 ] 

Yi Zhou commented on SPARK-15219:
-

Posted the core physical plan

> [Spark SQL] it don't support to detect runtime temporary table for enabling 
> broadcast hash join optimization
> 
>
> Key: SPARK-15219
> URL: https://issues.apache.org/jira/browse/SPARK-15219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Yi Zhou
>
> We observed an interesting thing about broadcast Hash join( similar to Map 
> Join in Hive) when comparing the implementation by Hive on MR engine. The 
> blew query is a multi-way join operation based on 3 tables including 
> product_reviews, 2 run-time temporary result tables(fsr and fwr) from 
> ‘select’ query operation and also there is a two-way join(1 table and 1 
> run-time temporary table) in both 'fsr' and 'fwr',which cause slower 
> performance than Hive on MR. We investigated the difference between Spark SQL 
> and Hive on MR engine and found that there are total of 5 map join tasks with 
> tuned map join parameters in Hive on MR but there are only 2 broadcast hash 
> join tasks in Spark SQL even if we set a larger threshold(e.g.,1GB) for 
> broadcast hash join. From our investigation, it seems that if there is 
> run-time temporary table in join operation in Spark SQL engine it will not 
> detect such table for enabling broadcast hash join optimization. 
> Core SQL snippet:
> {code}
> INSERT INTO TABLE q19_spark_sql_power_test_0_result
> SELECT *
> FROM
> ( --wrap in additional FROM(), because Sorting/distribute by with UDTF in 
> select clause is not allowed
>   SELECT extract_sentiment(pr.pr_item_sk, pr.pr_review_content) AS
>   (
> item_sk,
> review_sentence,
> sentiment,
> sentiment_word
>   )
>   FROM product_reviews pr,
>   (
> --store returns in week ending given date
> SELECT sr_item_sk, SUM(sr_return_quantity) sr_item_qty
> FROM store_returns sr,
> (
>   -- within the week ending a given date
>   SELECT d1.d_date_sk
>   FROM date_dim d1, date_dim d2
>   WHERE d1.d_week_seq = d2.d_week_seq
>   AND d2.d_date IN ( '2004-03-8' ,'2004-08-02' ,'2004-11-15', 
> '2004-12-20' )
> ) sr_dateFilter
> WHERE sr.sr_returned_date_sk = d_date_sk
> GROUP BY sr_item_sk --across all store and web channels
> HAVING sr_item_qty > 0
>   ) fsr,
>   (
> --web returns in week ending given date
> SELECT wr_item_sk, SUM(wr_return_quantity) wr_item_qty
> FROM web_returns wr,
> (
>   -- within the week ending a given date
>   SELECT d1.d_date_sk
>   FROM date_dim d1, date_dim d2
>   WHERE d1.d_week_seq = d2.d_week_seq
>   AND d2.d_date IN ( '2004-03-8' ,'2004-08-02' ,'2004-11-15', 
> '2004-12-20' )
> ) wr_dateFilter
> WHERE wr.wr_returned_date_sk = d_date_sk
> GROUP BY wr_item_sk  --across all store and web channels
> HAVING wr_item_qty > 0
>   ) fwr
>   WHERE fsr.sr_item_sk = fwr.wr_item_sk
>   AND pr.pr_item_sk = fsr.sr_item_sk --extract product_reviews for found items
>   -- equivalent across all store and web channels (within a tolerance of +/- 
> 10%)
>   AND abs( (sr_item_qty-wr_item_qty)/ ((sr_item_qty+wr_item_qty)/2)) <= 0.1
> )extractedSentiments
> WHERE sentiment= 'NEG' --if there are any major negative reviews.
> ORDER BY item_sk,review_sentence,sentiment,sentiment_word
> ;
> {code}
> Physical Plan:
> {code}
> == Physical Plan ==
> InsertIntoHiveTable MetastoreRelation bigbench_3t_sparksql, 
> q19_spark_sql_run_query_0_result, None, Map(), false, false
> +- ConvertToSafe
>+- Sort [item_sk#537L ASC,review_sentence#538 ASC,sentiment#539 
> ASC,sentiment_word#540 ASC], true, 0
>   +- ConvertToUnsafe
>  +- Exchange rangepartitioning(item_sk#537L ASC,review_sentence#538 
> ASC,sentiment#539 ASC,sentiment_word#540 ASC,200), None
> +- ConvertToSafe
>+- Project 
> [item_sk#537L,review_sentence#538,sentiment#539,sentiment_word#540]
>   +- Filter (sentiment#539 = NEG)
>  +- !Generate 
> HiveGenericUDTF#io.bigdatabenchmark.v1.queries.q10.SentimentUDF(pr_item_sk#363L,pr_review_content#366),
>  false, false, 
> [item_sk#537L,review_sentence#538,sentiment#539,sentiment_word#540]
> +- ConvertToSafe
>+- Project [pr_item_sk#363L,pr_review_content#366]
>   +- Filter (abs((cast((sr_item_qty#356L - 
> wr_item_qty#357L) as double) / (cast((sr_item_qty#356L + wr_item_qty#357L) as 
> double) / 2.0))) <= 0.1)
>  +- SortMergeJoin [sr_item_sk#369L], 
> [wr_item_sk#445L]
> :- Sort

[jira] [Updated] (SPARK-15219) [Spark SQL] it don't support to detect runtime temporary table for enabling broadcast hash join optimization

2016-05-09 Thread Yi Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhou updated SPARK-15219:

Description: 
We observed an interesting thing about broadcast Hash join( similar to Map Join 
in Hive) when comparing the implementation by Hive on MR engine. The blew query 
is a multi-way join operation based on 3 tables including product_reviews, 2 
run-time temporary result tables(fsr and fwr) from ‘select’ query operation and 
also there is a two-way join(1 table and 1 run-time temporary table) in both 
'fsr' and 'fwr',which cause slower performance than Hive on MR. We investigated 
the difference between Spark SQL and Hive on MR engine and found that there are 
total of 5 map join tasks with tuned map join parameters in Hive on MR but 
there are only 2 broadcast hash join tasks in Spark SQL even if we set a larger 
threshold(e.g.,1GB) for broadcast hash join. From our investigation, it seems 
that if there is run-time temporary table in join operation in Spark SQL engine 
it will not detect such table for enabling broadcast hash join optimization. 

Core SQL snippet:
{code}
INSERT INTO TABLE q19_spark_sql_power_test_0_result
SELECT *
FROM
( --wrap in additional FROM(), because Sorting/distribute by with UDTF in 
select clause is not allowed
  SELECT extract_sentiment(pr.pr_item_sk, pr.pr_review_content) AS
  (
item_sk,
review_sentence,
sentiment,
sentiment_word
  )
  FROM product_reviews pr,
  (
--store returns in week ending given date
SELECT sr_item_sk, SUM(sr_return_quantity) sr_item_qty
FROM store_returns sr,
(
  -- within the week ending a given date
  SELECT d1.d_date_sk
  FROM date_dim d1, date_dim d2
  WHERE d1.d_week_seq = d2.d_week_seq
  AND d2.d_date IN ( '2004-03-8' ,'2004-08-02' ,'2004-11-15', '2004-12-20' )
) sr_dateFilter
WHERE sr.sr_returned_date_sk = d_date_sk
GROUP BY sr_item_sk --across all store and web channels
HAVING sr_item_qty > 0
  ) fsr,
  (
--web returns in week ending given date
SELECT wr_item_sk, SUM(wr_return_quantity) wr_item_qty
FROM web_returns wr,
(
  -- within the week ending a given date
  SELECT d1.d_date_sk
  FROM date_dim d1, date_dim d2
  WHERE d1.d_week_seq = d2.d_week_seq
  AND d2.d_date IN ( '2004-03-8' ,'2004-08-02' ,'2004-11-15', '2004-12-20' )
) wr_dateFilter
WHERE wr.wr_returned_date_sk = d_date_sk
GROUP BY wr_item_sk  --across all store and web channels
HAVING wr_item_qty > 0
  ) fwr
  WHERE fsr.sr_item_sk = fwr.wr_item_sk
  AND pr.pr_item_sk = fsr.sr_item_sk --extract product_reviews for found items
  -- equivalent across all store and web channels (within a tolerance of +/- 
10%)
  AND abs( (sr_item_qty-wr_item_qty)/ ((sr_item_qty+wr_item_qty)/2)) <= 0.1
)extractedSentiments
WHERE sentiment= 'NEG' --if there are any major negative reviews.
ORDER BY item_sk,review_sentence,sentiment,sentiment_word
;
{code}

Physical Plan:
{code}
== Physical Plan ==
InsertIntoHiveTable MetastoreRelation bigbench_3t_sparksql, 
q19_spark_sql_run_query_0_result, None, Map(), false, false
+- ConvertToSafe
   +- Sort [item_sk#537L ASC,review_sentence#538 ASC,sentiment#539 
ASC,sentiment_word#540 ASC], true, 0
  +- ConvertToUnsafe
 +- Exchange rangepartitioning(item_sk#537L ASC,review_sentence#538 
ASC,sentiment#539 ASC,sentiment_word#540 ASC,200), None
+- ConvertToSafe
   +- Project 
[item_sk#537L,review_sentence#538,sentiment#539,sentiment_word#540]
  +- Filter (sentiment#539 = NEG)
 +- !Generate 
HiveGenericUDTF#io.bigdatabenchmark.v1.queries.q10.SentimentUDF(pr_item_sk#363L,pr_review_content#366),
 false, false, 
[item_sk#537L,review_sentence#538,sentiment#539,sentiment_word#540]
+- ConvertToSafe
   +- Project [pr_item_sk#363L,pr_review_content#366]
  +- Filter (abs((cast((sr_item_qty#356L - 
wr_item_qty#357L) as double) / (cast((sr_item_qty#356L + wr_item_qty#357L) as 
double) / 2.0))) <= 0.1)
 +- SortMergeJoin [sr_item_sk#369L], 
[wr_item_sk#445L]
:- Sort [sr_item_sk#369L ASC], false, 0
:  +- Project 
[pr_item_sk#363L,sr_item_qty#356L,pr_review_content#366,sr_item_sk#369L]
: +- SortMergeJoin [pr_item_sk#363L], 
[sr_item_sk#369L]
::- Sort [pr_item_sk#363L ASC], 
false, 0
::  +- TungstenExchange 
hashpartitioning(pr_item_sk#363L,200), None
:: +- ConvertToUnsafe
::+- HiveTableScan 
[pr_item_sk#363L,pr_review_content#366], MetastoreRelation 
bigbench_3t_sparksql, product_reviews, Some(pr)

[jira] [Commented] (SPARK-15219) [Spark SQL] it don't support to detect runtime temporary table for enabling broadcast hash join optimization

2016-05-09 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276042#comment-15276042
 ] 

Herman van Hovell commented on SPARK-15219:
---

[~jameszhouyi] Could you also post the query plan? Use either {{explain 
extended ...}} in SQL or {{df.explain(true)}} using dataframes.

> [Spark SQL] it don't support to detect runtime temporary table for enabling 
> broadcast hash join optimization
> 
>
> Key: SPARK-15219
> URL: https://issues.apache.org/jira/browse/SPARK-15219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Yi Zhou
>
> We observed an interesting thing about broadcast Hash join( similar to Map 
> Join in Hive) when comparing the implementation by Hive on MR engine. The 
> blew query is a multi-way join operation based on 3 tables including 
> product_reviews, 2 run-time temporary result tables(fsr and fwr) from 
> ‘select’ query operation and also there is a two-way join(1 table and 1 
> run-time temporary table) in both 'fsr' and 'fwr',which cause slower 
> performance than Hive on MR. We investigated the difference between Spark SQL 
> and Hive on MR engine and found that there are total of 5 map join tasks with 
> tuned map join parameters in Hive on MR but there are only 2 broadcast hash 
> join tasks in Spark SQL even if we set a larger threshold(e.g.,1GB) for 
> broadcast hash join. From our investigation, it seems that if there is 
> run-time temporary table in join operation in Spark SQL engine it will not 
> detect such table for enabling broadcast hash join optimization. 
> Core SQL snippet:
> {code}
> INSERT INTO TABLE q19_spark_sql_power_test_0_result
> SELECT *
> FROM
> ( --wrap in additional FROM(), because Sorting/distribute by with UDTF in 
> select clause is not allowed
>   SELECT extract_sentiment(pr.pr_item_sk, pr.pr_review_content) AS
>   (
> item_sk,
> review_sentence,
> sentiment,
> sentiment_word
>   )
>   FROM product_reviews pr,
>   (
> --store returns in week ending given date
> SELECT sr_item_sk, SUM(sr_return_quantity) sr_item_qty
> FROM store_returns sr,
> (
>   -- within the week ending a given date
>   SELECT d1.d_date_sk
>   FROM date_dim d1, date_dim d2
>   WHERE d1.d_week_seq = d2.d_week_seq
>   AND d2.d_date IN ( '2004-03-8' ,'2004-08-02' ,'2004-11-15', 
> '2004-12-20' )
> ) sr_dateFilter
> WHERE sr.sr_returned_date_sk = d_date_sk
> GROUP BY sr_item_sk --across all store and web channels
> HAVING sr_item_qty > 0
>   ) fsr,
>   (
> --web returns in week ending given date
> SELECT wr_item_sk, SUM(wr_return_quantity) wr_item_qty
> FROM web_returns wr,
> (
>   -- within the week ending a given date
>   SELECT d1.d_date_sk
>   FROM date_dim d1, date_dim d2
>   WHERE d1.d_week_seq = d2.d_week_seq
>   AND d2.d_date IN ( '2004-03-8' ,'2004-08-02' ,'2004-11-15', 
> '2004-12-20' )
> ) wr_dateFilter
> WHERE wr.wr_returned_date_sk = d_date_sk
> GROUP BY wr_item_sk  --across all store and web channels
> HAVING wr_item_qty > 0
>   ) fwr
>   WHERE fsr.sr_item_sk = fwr.wr_item_sk
>   AND pr.pr_item_sk = fsr.sr_item_sk --extract product_reviews for found items
>   -- equivalent across all store and web channels (within a tolerance of +/- 
> 10%)
>   AND abs( (sr_item_qty-wr_item_qty)/ ((sr_item_qty+wr_item_qty)/2)) <= 0.1
> )extractedSentiments
> WHERE sentiment= 'NEG' --if there are any major negative reviews.
> ORDER BY item_sk,review_sentence,sentiment,sentiment_word
> ;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15211) Select features column from LibSVMRelation causes failure

2016-05-09 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-15211:
---
Affects Version/s: 2.0.0
 Target Version/s: 2.0.0
  Description: 
It will cause failure when trying to load data with LibSVMRelation and select 
features column:

{code}
val df2 = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
df: org.apache.spark.sql.DataFrame = [label: double, features: vector]

scala> df2.select("features").show
java.lang.RuntimeException: Error while decoding: scala.MatchError: 19 (of 
class java.lang.Byte)
createexternalrow(if (isnull(input[0, vector])) null else newInstance(class 
org.apache.spark.mllib.linalg.VectorUDT).deserialize, 
StructField(features,org.apache.spark.mllib.linalg.VectorUDT@f71b0bce,true))
...
{code}

  was:

It will cause failure when trying to load data with LibSVMRelation and select 
features column:

{code}
val df2 = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
df: org.apache.spark.sql.DataFrame = [label: double, features: vector]

scala> df2.select("features").show
java.lang.RuntimeException: Error while decoding: scala.MatchError: 19 (of 
class java.lang.Byte)
createexternalrow(if (isnull(input[0, vector])) null else newInstance(class 
org.apache.spark.mllib.linalg.VectorUDT).deserialize, 
StructField(features,org.apache.spark.mllib.linalg.VectorUDT@f71b0bce,true))
...
{code}


> Select features column from LibSVMRelation causes failure
> -
>
> Key: SPARK-15211
> URL: https://issues.apache.org/jira/browse/SPARK-15211
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
> Fix For: 2.0.0
>
>
> It will cause failure when trying to load data with LibSVMRelation and select 
> features column:
> {code}
> val df2 = 
> spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
> df: org.apache.spark.sql.DataFrame = [label: double, features: vector]
> scala> df2.select("features").show
> java.lang.RuntimeException: Error while decoding: scala.MatchError: 19 (of 
> class java.lang.Byte)
> createexternalrow(if (isnull(input[0, vector])) null else newInstance(class 
> org.apache.spark.mllib.linalg.VectorUDT).deserialize, 
> StructField(features,org.apache.spark.mllib.linalg.VectorUDT@f71b0bce,true))
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-15211) Select features column from LibSVMRelation causes failure

2016-05-09 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-15211.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12986
[https://github.com/apache/spark/pull/12986]

> Select features column from LibSVMRelation causes failure
> -
>
> Key: SPARK-15211
> URL: https://issues.apache.org/jira/browse/SPARK-15211
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
> Fix For: 2.0.0
>
>
> It will cause failure when trying to load data with LibSVMRelation and select 
> features column:
> {code}
> val df2 = 
> spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
> df: org.apache.spark.sql.DataFrame = [label: double, features: vector]
> scala> df2.select("features").show
> java.lang.RuntimeException: Error while decoding: scala.MatchError: 19 (of 
> class java.lang.Byte)
> createexternalrow(if (isnull(input[0, vector])) null else newInstance(class 
> org.apache.spark.mllib.linalg.VectorUDT).deserialize, 
> StructField(features,org.apache.spark.mllib.linalg.VectorUDT@f71b0bce,true))
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15211) Select features column from LibSVMRelation causes failure

2016-05-09 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-15211:
---
Assignee: Liang-Chi Hsieh

> Select features column from LibSVMRelation causes failure
> -
>
> Key: SPARK-15211
> URL: https://issues.apache.org/jira/browse/SPARK-15211
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
> Fix For: 2.0.0
>
>
> It will cause failure when trying to load data with LibSVMRelation and select 
> features column:
> {code}
> val df2 = 
> spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
> df: org.apache.spark.sql.DataFrame = [label: double, features: vector]
> scala> df2.select("features").show
> java.lang.RuntimeException: Error while decoding: scala.MatchError: 19 (of 
> class java.lang.Byte)
> createexternalrow(if (isnull(input[0, vector])) null else newInstance(class 
> org.apache.spark.mllib.linalg.VectorUDT).deserialize, 
> StructField(features,org.apache.spark.mllib.linalg.VectorUDT@f71b0bce,true))
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-928) Add support for Unsafe-based serializer in Kryo 2.22

2016-05-09 Thread Sandeep Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265805#comment-15265805
 ] 

Sandeep Singh edited comment on SPARK-928 at 5/9/16 7:02 AM:
-

[~joshrosen] I would like to work on this.

I tried benchmarking the difference between unsafe kryo and our current impl. 
and then we can have a spark.kryo.useUnsafe flag as Matei has mentioned.

{code:title=Benchmarking results|borderStyle=solid}
JBenchmark Kryo Unsafe vs safe Serialization: Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
  

  basicTypes: Int unsafe:true160 /  178 98.5
  10.1   1.0X
  basicTypes: Long unsafe:true   210 /  218 74.9
  13.4   0.8X
  basicTypes: Float unsafe:true  203 /  213 77.5
  12.9   0.8X
  basicTypes: Double unsafe:true 226 /  235 69.5
  14.4   0.7X
  Array: Int unsafe:true1087 / 1101 14.5
  69.1   0.1X
  Array: Long unsafe:true   2758 / 2844  5.7
 175.4   0.1X
  Array: Float unsafe:true  1511 / 1552 10.4
  96.1   0.1X
  Array: Double unsafe:true 2942 / 2972  5.3
 187.0   0.1X
  Map of string->Double unsafe:true 2645 / 2739  5.9
 168.2   0.1X
  basicTypes: Int unsafe:false   211 /  218 74.7
  13.4   0.8X
  basicTypes: Long unsafe:false  247 /  253 63.6
  15.7   0.6X
  basicTypes: Float unsafe:false 211 /  216 74.5
  13.4   0.8X
  basicTypes: Double unsafe:false227 /  233 69.2
  14.4   0.7X
  Array: Int unsafe:false   3012 / 3032  5.2
 191.5   0.1X
  Array: Long unsafe:false  4463 / 4515  3.5
 283.8   0.0X
  Array: Float unsafe:false 2788 / 2868  5.6
 177.2   0.1X
  Array: Double unsafe:false3558 / 3752  4.4
 226.2   0.0X
  Map of string->Double unsafe:false2806 / 2933  5.6
 178.4   0.1X
{code}

You can find the code for benchmarking here 
(https://github.com/techaddict/spark/commit/46fa44141c849ca15bbd6136cea2fa52bd927da2),
 very ugly right now but will improve it(add more benchmarks) before creating a 
PR.



was (Author: techaddict):
[~joshrosen] I would like to work on this.

I tried benchmarking the difference between unsafe kryo and our current impl. 
and then we can have a spark.kryo.useUnsafe flag as Matei has mentioned.

{code:title=Benchmarking results|borderStyle=solid}
Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 on Mac OS X 10.11.4
  Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz

  Benchmark Kryo Unsafe vs safe Serialization: Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
  

  basicTypes: Int unsafe:false 2 /4   8988.0
   0.1   1.0X
  basicTypes: Long unsafe:false1 /1  13981.3
   0.1   1.6X
  basicTypes: Float unsafe:false   1 /1  14460.6
   0.1   1.6X
  basicTypes: Double unsafe:false  1 /1  15876.9
   0.1   1.8X
  Array: Int unsafe:false 33 /   44474.8
   2.1   0.1X
  Array: Long unsafe:false18 /   25888.6
   1.1   0.1X
  Array: Float unsafe:false   10 /   16   1627.4
   0.6   0.2X
  Array: Double unsafe:false  10 /   13   1523.1
   0.7   0.2X
  Map of string->Double unsafe:false 413 /  447 38.1
  26.3   0.0X
  basicTypes: Int unsafe:true  1 /1  16402.6
   0.1   1.8X
  basicTypes: Long unsafe:true 1 /1  19732.1
   0.1   2.2X
  basicTypes: Float unsafe:true1 /1  19752.9
   0.1   2.2X
  basicTypes: Double unsafe:true   1 /1  23111.4
   0.0   2.6X
  Array: Int unsafe:true   7 /8   2239.9
   0.4   0.2X
  Array: Long unsafe:true  8 /9   2000.1
   0.5   0.2X
  Array: Float unsafe:true

[jira] [Created] (SPARK-15220) Add hyperlink to "running application" and "completed application"

2016-05-09 Thread Mao, Wei (JIRA)

Mao, Wei created SPARK-15220:


 Summary: Add hyperlink to "running application" and "completed 
application"
 Key: SPARK-15220
 URL: https://issues.apache.org/jira/browse/SPARK-15220
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Mao, Wei
Priority: Minor


Add hyperlink to "running application" and "completed application", so user can 
jump to application table directly, In my environment, I set up 1000+ works and 
it's painful to scroll down to skip worker list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15219) [Spark SQL] it don't support to detect runtime temporary table for enabling broadcast hash join optimization

2016-05-09 Thread Yi Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhou updated SPARK-15219:

Description: 
We observed an interesting thing about broadcast Hash join( similar to Map Join 
in Hive) when comparing the implementation by Hive on MR engine. The blew query 
is a multi-way join operation based on 3 tables including product_reviews, 2 
run-time temporary result tables(fsr and fwr) from ‘select’ query operation and 
also there is a two-way join(1 table and 1 run-time temporary table) in both 
'fsr' and 'fwr',which cause slower performance than Hive on MR. We investigated 
the difference between Spark SQL and Hive on MR engine and found that there are 
total of 5 map join tasks with tuned map join parameters in Hive on MR but 
there are only 2 broadcast hash join tasks in Spark SQL even if we set a larger 
threshold(e.g.,1GB) for broadcast hash join. From our investigation, it seems 
that if there is run-time temporary table in join operation in Spark SQL engine 
it will not detect such table for enabling broadcast hash join optimization. 

Core SQL snippet:
{code}
INSERT INTO TABLE q19_spark_sql_power_test_0_result
SELECT *
FROM
( --wrap in additional FROM(), because Sorting/distribute by with UDTF in 
select clause is not allowed
  SELECT extract_sentiment(pr.pr_item_sk, pr.pr_review_content) AS
  (
item_sk,
review_sentence,
sentiment,
sentiment_word
  )
  FROM product_reviews pr,
  (
--store returns in week ending given date
SELECT sr_item_sk, SUM(sr_return_quantity) sr_item_qty
FROM store_returns sr,
(
  -- within the week ending a given date
  SELECT d1.d_date_sk
  FROM date_dim d1, date_dim d2
  WHERE d1.d_week_seq = d2.d_week_seq
  AND d2.d_date IN ( '2004-03-8' ,'2004-08-02' ,'2004-11-15', '2004-12-20' )
) sr_dateFilter
WHERE sr.sr_returned_date_sk = d_date_sk
GROUP BY sr_item_sk --across all store and web channels
HAVING sr_item_qty > 0
  ) fsr,
  (
--web returns in week ending given date
SELECT wr_item_sk, SUM(wr_return_quantity) wr_item_qty
FROM web_returns wr,
(
  -- within the week ending a given date
  SELECT d1.d_date_sk
  FROM date_dim d1, date_dim d2
  WHERE d1.d_week_seq = d2.d_week_seq
  AND d2.d_date IN ( '2004-03-8' ,'2004-08-02' ,'2004-11-15', '2004-12-20' )
) wr_dateFilter
WHERE wr.wr_returned_date_sk = d_date_sk
GROUP BY wr_item_sk  --across all store and web channels
HAVING wr_item_qty > 0
  ) fwr
  WHERE fsr.sr_item_sk = fwr.wr_item_sk
  AND pr.pr_item_sk = fsr.sr_item_sk --extract product_reviews for found items
  -- equivalent across all store and web channels (within a tolerance of +/- 
10%)
  AND abs( (sr_item_qty-wr_item_qty)/ ((sr_item_qty+wr_item_qty)/2)) <= 0.1
)extractedSentiments
WHERE sentiment= 'NEG' --if there are any major negative reviews.
ORDER BY item_sk,review_sentence,sentiment,sentiment_word
;
{code}

  was:
We observed an interesting thing about broadcast Hash join( similar to Map Join 
in Hive) when comparing the implementation by Hive on MR engine. The blew query 
is a multi-way join operation based on 3 tables including product_reviews, 2 
run-time temporary result tables(fsr and fwr) from ‘select’ query operation and 
also there is a two-way join(1 table and 1 run-time temporary table) in both 
'fsr' and 'fwr'. We investigated the difference between Spark SQL and Hive on 
MR engine and found that there are total of 5 map join tasks with tuned map 
join parameters in Hive on MR but there are only 2 broadcast hash join tasks in 
Spark SQL even if we set a larger threshold(e.g.,1GB) for broadcast hash join. 
From our investigation, it seems that if there is run-time temporary table in 
join operation in Spark SQL engine it will not detect such table for enabling 
broadcast hash join optimization. 

Core SQL snippet:
{code}
INSERT INTO TABLE q19_spark_sql_power_test_0_result
SELECT *
FROM
( --wrap in additional FROM(), because Sorting/distribute by with UDTF in 
select clause is not allowed
  SELECT extract_sentiment(pr.pr_item_sk, pr.pr_review_content) AS
  (
item_sk,
review_sentence,
sentiment,
sentiment_word
  )
  FROM product_reviews pr,
  (
--store returns in week ending given date
SELECT sr_item_sk, SUM(sr_return_quantity) sr_item_qty
FROM store_returns sr,
(
  -- within the week ending a given date
  SELECT d1.d_date_sk
  FROM date_dim d1, date_dim d2
  WHERE d1.d_week_seq = d2.d_week_seq
  AND d2.d_date IN ( '2004-03-8' ,'2004-08-02' ,'2004-11-15', '2004-12-20' )
) sr_dateFilter
WHERE sr.sr_returned_date_sk = d_date_sk
GROUP BY sr_item_sk --across all store and web channels
HAVING sr_item_qty > 0
  ) fsr,
  (
--web returns in week ending given date
SELECT wr_item_sk, SUM(wr_return_quantity) wr_item_qty
FROM web_returns wr,
(
  -- within the

[jira] [Created] (SPARK-15219) [Spark SQL] it don't support to detect runtime temporary table for enabling broadcast hash join optimization

2016-05-09 Thread Yi Zhou (JIRA)

Yi Zhou created SPARK-15219:
---

 Summary: [Spark SQL] it don't support to detect runtime temporary 
table for enabling broadcast hash join optimization
 Key: SPARK-15219
 URL: https://issues.apache.org/jira/browse/SPARK-15219
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Yi Zhou


We observed an interesting thing about broadcast Hash join( similar to Map Join 
in Hive) when comparing the implementation by Hive on MR engine. The blew query 
is a multi-way join operation based on 3 tables including product_reviews, 2 
run-time temporary result tables(fsr and fwr) from ‘select’ query operation and 
also there is a two-way join(1 table and 1 run-time temporary table) in both 
'fsr' and 'fwr'. We investigated the difference between Spark SQL and Hive on 
MR engine and found that there are total of 5 map join tasks with tuned map 
join parameters in Hive on MR but there are only 2 broadcast hash join tasks in 
Spark SQL even if we set a larger threshold(e.g.,1GB) for broadcast hash join. 
From our investigation, it seems that if there is run-time temporary table in 
join operation in Spark SQL engine it will not detect such table for enabling 
broadcast hash join optimization. 

Core SQL snippet:
{code}
INSERT INTO TABLE q19_spark_sql_power_test_0_result
SELECT *
FROM
( --wrap in additional FROM(), because Sorting/distribute by with UDTF in 
select clause is not allowed
  SELECT extract_sentiment(pr.pr_item_sk, pr.pr_review_content) AS
  (
item_sk,
review_sentence,
sentiment,
sentiment_word
  )
  FROM product_reviews pr,
  (
--store returns in week ending given date
SELECT sr_item_sk, SUM(sr_return_quantity) sr_item_qty
FROM store_returns sr,
(
  -- within the week ending a given date
  SELECT d1.d_date_sk
  FROM date_dim d1, date_dim d2
  WHERE d1.d_week_seq = d2.d_week_seq
  AND d2.d_date IN ( '2004-03-8' ,'2004-08-02' ,'2004-11-15', '2004-12-20' )
) sr_dateFilter
WHERE sr.sr_returned_date_sk = d_date_sk
GROUP BY sr_item_sk --across all store and web channels
HAVING sr_item_qty > 0
  ) fsr,
  (
--web returns in week ending given date
SELECT wr_item_sk, SUM(wr_return_quantity) wr_item_qty
FROM web_returns wr,
(
  -- within the week ending a given date
  SELECT d1.d_date_sk
  FROM date_dim d1, date_dim d2
  WHERE d1.d_week_seq = d2.d_week_seq
  AND d2.d_date IN ( '2004-03-8' ,'2004-08-02' ,'2004-11-15', '2004-12-20' )
) wr_dateFilter
WHERE wr.wr_returned_date_sk = d_date_sk
GROUP BY wr_item_sk  --across all store and web channels
HAVING wr_item_qty > 0
  ) fwr
  WHERE fsr.sr_item_sk = fwr.wr_item_sk
  AND pr.pr_item_sk = fsr.sr_item_sk --extract product_reviews for found items
  -- equivalent across all store and web channels (within a tolerance of +/- 
10%)
  AND abs( (sr_item_qty-wr_item_qty)/ ((sr_item_qty+wr_item_qty)/2)) <= 0.1
)extractedSentiments
WHERE sentiment= 'NEG' --if there are any major negative reviews.
ORDER BY item_sk,review_sentence,sentiment,sentiment_word
;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15218) Error: Could not find or load main class org.apache.spark.launcher.Main when run from a directory containing colon ':'

2016-05-09 Thread Adam Cecile (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Cecile updated SPARK-15218:

Description: 
{noformat}
mkdir /tmp/qwe:rtz
cd /tmp/qwe:rtz
wget 
http://www-eu.apache.org/dist/spark/spark-1.6.1/spark-1.6.1-bin-without-hadoop.tgz
tar xvzf spark-1.6.1-bin-without-hadoop.tgz 
cd spark-1.6.1-bin-without-hadoop/
bin/spark-submit
{noformat}

Returns "Error: Could not find or load main class 
org.apache.spark.launcher.Main".

That would not be such an issue if Mesos executor did not have colon in the 
generated paths. It means withtout hacking (define relative SPARK_HOME path by 
myself) there's no way to run a spark-job insode a mesos job container...

Best regards, Adam.


  was:
{noformat}
 mkdir /tmp/qwe:rtz
 cd /tmp/qwe:rtz
wget 
http://www-eu.apache.org/dist/spark/spark-1.6.1/spark-1.6.1-bin-without-hadoop.tgz
tar xvzf spark-1.6.1-bin-without-hadoop.tgz 
cd spark-1.6.1-bin-without-hadoop/
bin/spark-submit
{noformat}

Returns "Error: Could not find or load main class 
org.apache.spark.launcher.Main".

That would not be such an issue if Mesos executor did not have colon in the 
generated paths. It means withtout hacking (define relative SPARK_HOME path by 
myself) there's no way to run a spark-job insode a mesos job container...

Best regards, Adam.



> Error: Could not find or load main class org.apache.spark.launcher.Main when 
> run from a directory containing colon ':'
> --
>
> Key: SPARK-15218
> URL: https://issues.apache.org/jira/browse/SPARK-15218
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell, Spark Submit
>Affects Versions: 1.6.1
>Reporter: Adam Cecile
>  Labels: mesos
>
> {noformat}
> mkdir /tmp/qwe:rtz
> cd /tmp/qwe:rtz
> wget 
> http://www-eu.apache.org/dist/spark/spark-1.6.1/spark-1.6.1-bin-without-hadoop.tgz
> tar xvzf spark-1.6.1-bin-without-hadoop.tgz 
> cd spark-1.6.1-bin-without-hadoop/
> bin/spark-submit
> {noformat}
> Returns "Error: Could not find or load main class 
> org.apache.spark.launcher.Main".
> That would not be such an issue if Mesos executor did not have colon in the 
> generated paths. It means withtout hacking (define relative SPARK_HOME path 
> by myself) there's no way to run a spark-job insode a mesos job container...
> Best regards, Adam.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15218) Error: Could not find or load main class org.apache.spark.launcher.Main when run from a directory containing colon ':'

2016-05-09 Thread Adam Cecile (JIRA)

Adam Cecile created SPARK-15218:
---

 Summary: Error: Could not find or load main class 
org.apache.spark.launcher.Main when run from a directory containing colon ':'
 Key: SPARK-15218
 URL: https://issues.apache.org/jira/browse/SPARK-15218
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Spark Shell, Spark Submit
Affects Versions: 1.6.1
Reporter: Adam Cecile


{noformat}
 mkdir /tmp/qwe:rtz
 cd /tmp/qwe:rtz
wget 
http://www-eu.apache.org/dist/spark/spark-1.6.1/spark-1.6.1-bin-without-hadoop.tgz
tar xvzf spark-1.6.1-bin-without-hadoop.tgz 
cd spark-1.6.1-bin-without-hadoop/
bin/spark-submit
{noformat}

Returns "Error: Could not find or load main class 
org.apache.spark.launcher.Main".

That would not be such an issue if Mesos executor did not have colon in the 
generated paths. It means withtout hacking (define relative SPARK_HOME path by 
myself) there's no way to run a spark-job insode a mesos job container...

Best regards, Adam.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14057) sql time stamps do not respect time zones

2016-05-09 Thread Vijay Parmar (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275987#comment-15275987
 ] 

Vijay Parmar commented on SPARK-14057:
--

I have few suggestions to make here after looking into the issue along with 
referring google and other sources:-

1. We can make use of the built-in java.time.package which is available in Java 
8 and higher versions 
(http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/util/Date.java).
In short, here a new instance would be created to adjust the Time-Zone.
   
   In java.util.date package the class in most of the cases ignores the 
Time-Zone.

  We could try implementing this package.


2.  This code snippet can be handy :-

ZoneId zoneLondon = ZoneId.of("London"); 
ZonedDateTime nowLondon = ZonedDateTime.now ( zoneLondon );

ZoneId zoneSingapore = ZoneId.of("Singapore"); 
ZonedDateTime nowSingapore = nowLondon.withZoneSameInstant( zoneSingapore );
ZonedDateTime nowUTC = nowLondon.withZoneSameInstant( ZoneOffset.UTC );

3. We need to look into the SQL side code also To have an understanding how the 
Time is getting captured and stored once it is received from this end.

I will still keep on looking iinto the issue and will update you. Meanwhile, I 
also wait for your comment(s) on my suggestions.  




> sql time stamps do not respect time zones
> -
>
> Key: SPARK-14057
> URL: https://issues.apache.org/jira/browse/SPARK-14057
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Andrew Davidson
>Priority: Minor
>
> we have time stamp data. The time stamp data is UTC how ever when we load the 
> data into spark data frames, the system assume the time stamps are in the 
> local time zone. This causes problems for our data scientists. Often they 
> pull data from our data center into their local macs. The data centers run 
> UTC. There computers are typically in PST or EST.
> It is possible to hack around this problem
> This cause a lot of errors in their analysis
> A complete description of this issue can be found in the following mail msg
> https://www.mail-archive.com/user@spark.apache.org/msg48121.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

< 1 2

101 - 194 of 194 matches

Mail list logo