[jira] [Assigned] (SPARK-9548) BytesToBytesMap could have a destructive iterator

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9548:
---

Assignee: (was: Apache Spark)

> BytesToBytesMap could have a destructive iterator
> -
>
> Key: SPARK-9548
> URL: https://issues.apache.org/jira/browse/SPARK-9548
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Josh Rosen
>
> BytesToBytesMap.iterator() could be destructive, freeing each page as it 
> moves onto the next one.  There are some circumstances where we don't want a 
> destructive iterator (such as when we're building a KV sorter from a map), so 
> there should be a flag to control this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9589) Flaky test: HiveCompatibilitySuite.groupby8

2015-08-04 Thread Davies Liu (JIRA)
Davies Liu created SPARK-9589:
-

 Summary: Flaky test: HiveCompatibilitySuite.groupby8
 Key: SPARK-9589
 URL: https://issues.apache.org/jira/browse/SPARK-9589
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu
Assignee: Josh Rosen
Priority: Blocker


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39662/testReport/org.apache.spark.sql.hive.execution/HiveCompatibilitySuite/groupby8/

{code}
sbt.ForkMain$ForkError: 
Failed to execute query using catalyst:
Error: Job aborted due to stage failure: Task 24 in stage 3081.0 failed 1 
times, most recent failure: Lost task 24.0 in stage 3081.0 (TID 14919, 
localhost): java.lang.NullPointerException
at 
org.apache.spark.unsafe.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:226)
at 
org.apache.spark.unsafe.map.BytesToBytesMap$Location.updateAddressesAndSizes(BytesToBytesMap.java:366)
at 
org.apache.spark.unsafe.map.BytesToBytesMap$Location.putNewKey(BytesToBytesMap.java:600)
at 
org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.getAggregationBuffer(UnsafeFixedWidthAggregationMap.java:134)
at 
org.apache.spark.sql.execution.aggregate.UnsafeHybridAggregationIterator.initialize(UnsafeHybridAggregationIterator.scala:276)
at 
org.apache.spark.sql.execution.aggregate.UnsafeHybridAggregationIterator.(UnsafeHybridAggregationIterator.scala:290)
at 
org.apache.spark.sql.execution.aggregate.UnsafeHybridAggregationIterator$.createFromInputIterator(UnsafeHybridAggregationIterator.scala:358)
at 
org.apache.spark.sql.execution.aggregate.Aggregate$$anonfun$doExecute$1$$anonfun$5.apply(Aggregate.scala:130)
at 
org.apache.spark.sql.execution.aggregate.Aggregate$$anonfun$doExecute$1$$anonfun$5.apply(Aggregate.scala:121)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:706)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:706)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9548) BytesToBytesMap could have a destructive iterator

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9548:
---

Assignee: Apache Spark

> BytesToBytesMap could have a destructive iterator
> -
>
> Key: SPARK-9548
> URL: https://issues.apache.org/jira/browse/SPARK-9548
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Josh Rosen
>Assignee: Apache Spark
>
> BytesToBytesMap.iterator() could be destructive, freeing each page as it 
> moves onto the next one.  There are some circumstances where we don't want a 
> destructive iterator (such as when we're building a KV sorter from a map), so 
> there should be a flag to control this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9588) spark sql cache: partition level cache eviction

2015-08-04 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653172#comment-14653172
 ] 

Davies Liu commented on SPARK-9588:
---

cc [~lian cheng], Does the partition improvement in 1.5 work in this case?

> spark sql cache: partition level cache eviction
> ---
>
> Key: SPARK-9588
> URL: https://issues.apache.org/jira/browse/SPARK-9588
> Project: Spark
>  Issue Type: Improvement
>Reporter: Shenghu Yang
>
> In spark 1.4, we can only do 'cache table '. However, if we have 
> table which will get a new partition periodically, say every 10 minutes, we 
> have to do 'uncache' & then 'cache' the whole table, taking long time.
> Things would be much faster if we can do:
> (1) cache table  partition 
> (2) uncache table  partition 
> This way we will alway have a sliding window type of cached data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9548) BytesToBytesMap could have a destructive iterator

2015-08-04 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653173#comment-14653173
 ] 

Apache Spark commented on SPARK-9548:
-

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/7924

> BytesToBytesMap could have a destructive iterator
> -
>
> Key: SPARK-9548
> URL: https://issues.apache.org/jira/browse/SPARK-9548
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Josh Rosen
>
> BytesToBytesMap.iterator() could be destructive, freeing each page as it 
> moves onto the next one.  There are some circumstances where we don't want a 
> destructive iterator (such as when we're building a KV sorter from a map), so 
> there should be a flag to control this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9587) Spark Web UI not displaying while changing another network

2015-08-04 Thread Kaveen Raajan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653178#comment-14653178
 ] 

Kaveen Raajan commented on SPARK-9587:
--

I found *SPARK_PUBLIC_DNS* in spark configuration to overwrite DNS, but 
document is very simple, I don't know what it do?

> Spark Web UI not displaying while changing another network
> --
>
> Key: SPARK-9587
> URL: https://issues.apache.org/jira/browse/SPARK-9587
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.4.1
> Environment: Windows,
> Hadoop-2.5.2,
>Reporter: Kaveen Raajan
>
> I want to start my spark-shell with localhost instead of IP. I'm running 
> spark-shell in yarn-client mode. My Hadoop are running as singlenode cluster 
> connecting with localhost.
> I changed following property in spark-default.conf 
> {panel:title=spark-default.conf}
> spark.driver.hostlocalhost
> spark.driver.hosts   localhost
> {panel}
> Initially while starting spark-shell I'm connecting with some public network 
> (172.16.xxx.yyy) If I disconnect network mean Spark jobs are working without 
> any problem. But Spark web UI are not working.
> ApplicationMaster always connecting with current IP instead of localhost.
> My log are here
> {code}
> 15/08/04 10:17:10 INFO spark.SecurityManager: Changing view acls to: SYSTEM
> 15/08/04 10:17:10 INFO spark.SecurityManager: Changing modify acls to: SYSTEM
> 15/08/04 10:17:10 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(SYSTEM); users 
> with modify permissions: Set(SYSTEM)
> 15/08/04 10:17:10 INFO spark.HttpServer: Starting HTTP Server
> 15/08/04 10:17:10 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/08/04 10:17:10 INFO server.AbstractConnector: Started 
> SocketConnector@0.0.0.0:58416
> 15/08/04 10:17:10 INFO util.Utils: Successfully started service 'HTTP class 
> server' on port 58416.
> 15/08/04 10:17:15 INFO spark.SparkContext: Running Spark version 1.4.0
> 15/08/04 10:17:15 INFO spark.SecurityManager: Changing view acls to: SYSTEM
> 15/08/04 10:17:15 INFO spark.SecurityManager: Changing modify acls to: SYSTEM
> 15/08/04 10:17:15 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(SYSTEM); users 
> with modify permissions: Set(SYSTEM)
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 1.4.0
>   /_/
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51)
> Type in expressions to have them evaluated.
> Type :help for more information.
> 15/08/04 10:17:15 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 15/08/04 10:17:15 INFO Remoting: Starting remoting
> 15/08/04 10:17:16 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sparkDriver@localhost:58439]
> 15/08/04 10:17:16 INFO util.Utils: Successfully started service 'sparkDriver' 
> on port 58439.
> 15/08/04 10:17:16 INFO spark.SparkEnv: Registering MapOutputTracker
> 15/08/04 10:17:16 INFO spark.SparkEnv: Registering BlockManagerMaster
> 15/08/04 10:17:16 INFO storage.DiskBlockManager: Created local directory at 
> C:\Windows\Temp\spark-86221988-7e8b-4340-be80-a2be283845e3\blockmgr-2c1b95de-936b-44f3-b98d-263c45e310ca
> 15/08/04 10:17:16 INFO storage.MemoryStore: MemoryStore started with capacity 
> 265.4 MB
> 15/08/04 10:17:16 INFO spark.HttpFileServer: HTTP File server directory is 
> C:\Windows\Temp\spark-86221988-7e8b-4340-be80-a2be283845e3\httpd-da7b686d-deb0-446d-af20-42ded6d6d035
> 15/08/04 10:17:16 INFO spark.HttpServer: Starting HTTP Server
> 15/08/04 10:17:16 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/08/04 10:17:16 INFO server.AbstractConnector: Started 
> SocketConnector@0.0.0.0:58440
> 15/08/04 10:17:16 INFO util.Utils: Successfully started service 'HTTP file 
> server' on port 58440.
> 15/08/04 10:17:16 INFO spark.SparkEnv: Registering OutputCommitCoordinator
> 15/08/04 10:17:16 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/08/04 10:17:16 INFO server.AbstractConnector: Started 
> SelectChannelConnector@0.0.0.0:4040
> 15/08/04 10:17:16 INFO util.Utils: Successfully started service 'SparkUI' on 
> port 4040.
> 15/08/04 10:17:16 INFO ui.SparkUI: Started SparkUI at 
> http://172.16.123.123:4040
> 15/08/04 10:17:16 INFO client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8032
> 15/08/04 10:17:17 INFO yarn.Client: Requesting a new application from cluster 
> with 1 NodeManagers
> 15/08/04 10:17:17 INFO yarn.Client: Verifying our application has not 
> requested more than the maximum memory capability of the cluster (2048 MB per 
> container)
> 15/08/04 10:17:17 INFO yarn.Client: Will allocate AM contain

[jira] [Comment Edited] (SPARK-9587) Spark Web UI not displaying while changing another network

2015-08-04 Thread Kaveen Raajan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653178#comment-14653178
 ] 

Kaveen Raajan edited comment on SPARK-9587 at 8/4/15 7:08 AM:
--

Hi [~srowen]
Thanks for response,
I found *SPARK_PUBLIC_DNS* in spark configuration to overwrite DNS, but 
document is very simple, I don't know what it do?


was (Author: kaveenbigdata):
Hi [~srowen]
Thanks for responce,
I found *SPARK_PUBLIC_DNS* in spark configuration to overwrite DNS, but 
document is very simple, I don't know what it do?

> Spark Web UI not displaying while changing another network
> --
>
> Key: SPARK-9587
> URL: https://issues.apache.org/jira/browse/SPARK-9587
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.4.1
> Environment: Windows,
> Hadoop-2.5.2,
>Reporter: Kaveen Raajan
>
> I want to start my spark-shell with localhost instead of IP. I'm running 
> spark-shell in yarn-client mode. My Hadoop are running as singlenode cluster 
> connecting with localhost.
> I changed following property in spark-default.conf 
> {panel:title=spark-default.conf}
> spark.driver.hostlocalhost
> spark.driver.hosts   localhost
> {panel}
> Initially while starting spark-shell I'm connecting with some public network 
> (172.16.xxx.yyy) If I disconnect network mean Spark jobs are working without 
> any problem. But Spark web UI are not working.
> ApplicationMaster always connecting with current IP instead of localhost.
> My log are here
> {code}
> 15/08/04 10:17:10 INFO spark.SecurityManager: Changing view acls to: SYSTEM
> 15/08/04 10:17:10 INFO spark.SecurityManager: Changing modify acls to: SYSTEM
> 15/08/04 10:17:10 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(SYSTEM); users 
> with modify permissions: Set(SYSTEM)
> 15/08/04 10:17:10 INFO spark.HttpServer: Starting HTTP Server
> 15/08/04 10:17:10 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/08/04 10:17:10 INFO server.AbstractConnector: Started 
> SocketConnector@0.0.0.0:58416
> 15/08/04 10:17:10 INFO util.Utils: Successfully started service 'HTTP class 
> server' on port 58416.
> 15/08/04 10:17:15 INFO spark.SparkContext: Running Spark version 1.4.0
> 15/08/04 10:17:15 INFO spark.SecurityManager: Changing view acls to: SYSTEM
> 15/08/04 10:17:15 INFO spark.SecurityManager: Changing modify acls to: SYSTEM
> 15/08/04 10:17:15 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(SYSTEM); users 
> with modify permissions: Set(SYSTEM)
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 1.4.0
>   /_/
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51)
> Type in expressions to have them evaluated.
> Type :help for more information.
> 15/08/04 10:17:15 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 15/08/04 10:17:15 INFO Remoting: Starting remoting
> 15/08/04 10:17:16 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sparkDriver@localhost:58439]
> 15/08/04 10:17:16 INFO util.Utils: Successfully started service 'sparkDriver' 
> on port 58439.
> 15/08/04 10:17:16 INFO spark.SparkEnv: Registering MapOutputTracker
> 15/08/04 10:17:16 INFO spark.SparkEnv: Registering BlockManagerMaster
> 15/08/04 10:17:16 INFO storage.DiskBlockManager: Created local directory at 
> C:\Windows\Temp\spark-86221988-7e8b-4340-be80-a2be283845e3\blockmgr-2c1b95de-936b-44f3-b98d-263c45e310ca
> 15/08/04 10:17:16 INFO storage.MemoryStore: MemoryStore started with capacity 
> 265.4 MB
> 15/08/04 10:17:16 INFO spark.HttpFileServer: HTTP File server directory is 
> C:\Windows\Temp\spark-86221988-7e8b-4340-be80-a2be283845e3\httpd-da7b686d-deb0-446d-af20-42ded6d6d035
> 15/08/04 10:17:16 INFO spark.HttpServer: Starting HTTP Server
> 15/08/04 10:17:16 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/08/04 10:17:16 INFO server.AbstractConnector: Started 
> SocketConnector@0.0.0.0:58440
> 15/08/04 10:17:16 INFO util.Utils: Successfully started service 'HTTP file 
> server' on port 58440.
> 15/08/04 10:17:16 INFO spark.SparkEnv: Registering OutputCommitCoordinator
> 15/08/04 10:17:16 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/08/04 10:17:16 INFO server.AbstractConnector: Started 
> SelectChannelConnector@0.0.0.0:4040
> 15/08/04 10:17:16 INFO util.Utils: Successfully started service 'SparkUI' on 
> port 4040.
> 15/08/04 10:17:16 INFO ui.SparkUI: Started SparkUI at 
> http://172.16.123.123:4040
> 15/08/04 10:17:16 INFO client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8032
> 15/08/04 10:17:17 INFO yarn.Client: Requesting a new applicat

[jira] [Comment Edited] (SPARK-9587) Spark Web UI not displaying while changing another network

2015-08-04 Thread Kaveen Raajan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653178#comment-14653178
 ] 

Kaveen Raajan edited comment on SPARK-9587 at 8/4/15 7:08 AM:
--

Hi [~srowen]
Thanks for responce,
I found *SPARK_PUBLIC_DNS* in spark configuration to overwrite DNS, but 
document is very simple, I don't know what it do?


was (Author: kaveenbigdata):
I found *SPARK_PUBLIC_DNS* in spark configuration to overwrite DNS, but 
document is very simple, I don't know what it do?

> Spark Web UI not displaying while changing another network
> --
>
> Key: SPARK-9587
> URL: https://issues.apache.org/jira/browse/SPARK-9587
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.4.1
> Environment: Windows,
> Hadoop-2.5.2,
>Reporter: Kaveen Raajan
>
> I want to start my spark-shell with localhost instead of IP. I'm running 
> spark-shell in yarn-client mode. My Hadoop are running as singlenode cluster 
> connecting with localhost.
> I changed following property in spark-default.conf 
> {panel:title=spark-default.conf}
> spark.driver.hostlocalhost
> spark.driver.hosts   localhost
> {panel}
> Initially while starting spark-shell I'm connecting with some public network 
> (172.16.xxx.yyy) If I disconnect network mean Spark jobs are working without 
> any problem. But Spark web UI are not working.
> ApplicationMaster always connecting with current IP instead of localhost.
> My log are here
> {code}
> 15/08/04 10:17:10 INFO spark.SecurityManager: Changing view acls to: SYSTEM
> 15/08/04 10:17:10 INFO spark.SecurityManager: Changing modify acls to: SYSTEM
> 15/08/04 10:17:10 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(SYSTEM); users 
> with modify permissions: Set(SYSTEM)
> 15/08/04 10:17:10 INFO spark.HttpServer: Starting HTTP Server
> 15/08/04 10:17:10 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/08/04 10:17:10 INFO server.AbstractConnector: Started 
> SocketConnector@0.0.0.0:58416
> 15/08/04 10:17:10 INFO util.Utils: Successfully started service 'HTTP class 
> server' on port 58416.
> 15/08/04 10:17:15 INFO spark.SparkContext: Running Spark version 1.4.0
> 15/08/04 10:17:15 INFO spark.SecurityManager: Changing view acls to: SYSTEM
> 15/08/04 10:17:15 INFO spark.SecurityManager: Changing modify acls to: SYSTEM
> 15/08/04 10:17:15 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(SYSTEM); users 
> with modify permissions: Set(SYSTEM)
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 1.4.0
>   /_/
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51)
> Type in expressions to have them evaluated.
> Type :help for more information.
> 15/08/04 10:17:15 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 15/08/04 10:17:15 INFO Remoting: Starting remoting
> 15/08/04 10:17:16 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sparkDriver@localhost:58439]
> 15/08/04 10:17:16 INFO util.Utils: Successfully started service 'sparkDriver' 
> on port 58439.
> 15/08/04 10:17:16 INFO spark.SparkEnv: Registering MapOutputTracker
> 15/08/04 10:17:16 INFO spark.SparkEnv: Registering BlockManagerMaster
> 15/08/04 10:17:16 INFO storage.DiskBlockManager: Created local directory at 
> C:\Windows\Temp\spark-86221988-7e8b-4340-be80-a2be283845e3\blockmgr-2c1b95de-936b-44f3-b98d-263c45e310ca
> 15/08/04 10:17:16 INFO storage.MemoryStore: MemoryStore started with capacity 
> 265.4 MB
> 15/08/04 10:17:16 INFO spark.HttpFileServer: HTTP File server directory is 
> C:\Windows\Temp\spark-86221988-7e8b-4340-be80-a2be283845e3\httpd-da7b686d-deb0-446d-af20-42ded6d6d035
> 15/08/04 10:17:16 INFO spark.HttpServer: Starting HTTP Server
> 15/08/04 10:17:16 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/08/04 10:17:16 INFO server.AbstractConnector: Started 
> SocketConnector@0.0.0.0:58440
> 15/08/04 10:17:16 INFO util.Utils: Successfully started service 'HTTP file 
> server' on port 58440.
> 15/08/04 10:17:16 INFO spark.SparkEnv: Registering OutputCommitCoordinator
> 15/08/04 10:17:16 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/08/04 10:17:16 INFO server.AbstractConnector: Started 
> SelectChannelConnector@0.0.0.0:4040
> 15/08/04 10:17:16 INFO util.Utils: Successfully started service 'SparkUI' on 
> port 4040.
> 15/08/04 10:17:16 INFO ui.SparkUI: Started SparkUI at 
> http://172.16.123.123:4040
> 15/08/04 10:17:16 INFO client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8032
> 15/08/04 10:17:17 INFO yarn.Client: Requesting a new application from cluster 
> with 1 NodeMan

[jira] [Created] (SPARK-9590) support metaq to streaming

2015-08-04 Thread zhouxiaoke (JIRA)
zhouxiaoke created SPARK-9590:
-

 Summary: support metaq to streaming
 Key: SPARK-9590
 URL: https://issues.apache.org/jira/browse/SPARK-9590
 Project: Spark
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 1.4.0
Reporter: zhouxiaoke
 Fix For: 1.4.2


add a  function  from metaq to streaming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9533) Add missing methods in Word2Vec ML (Python API)

2015-08-04 Thread Manoj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Kumar updated SPARK-9533:
---
Component/s: PySpark

> Add missing methods in Word2Vec ML (Python API)
> ---
>
> Key: SPARK-9533
> URL: https://issues.apache.org/jira/browse/SPARK-9533
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Manoj Kumar
>Priority: Minor
>
> After 8874 is resolved, we can add python wrappers for the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5774) Support save RDD append to file

2015-08-04 Thread Murtaza Kanchwala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653287#comment-14653287
 ] 

Murtaza Kanchwala commented on SPARK-5774:
--

spark.hadoop.validateOutputSpecs=false makes spark to overwrite the files, Is 
there any way we could setup to append the files in the same path instead of 
overwrite

> Support save RDD append to file
> ---
>
> Key: SPARK-5774
> URL: https://issues.apache.org/jira/browse/SPARK-5774
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Yanbo Liang
>
> Now RDD.saveAsTextFile only support writing to a file which is empty. In some 
> cases, we need to save RDD append to an existing file. For example, when 
> execute SQL command "INSERT INTO ...", we need to append the RDD to an 
> existing file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9119) In some cases, we may save wrong decimal values to parquet

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9119:
---

Assignee: Davies Liu  (was: Apache Spark)

> In some cases, we may save wrong decimal values to parquet
> --
>
> Key: SPARK-9119
> URL: https://issues.apache.org/jira/browse/SPARK-9119
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Davies Liu
>Priority: Blocker
>
> {code}
> > 
> import org.apache.spark.sql.Row
> import 
> org.apache.spark.sql.types.{StructType,StructField,StringType,DecimalType}
> import org.apache.spark.sql.types.Decimal
> ​
> val schema = StructType(Array(StructField("name", DecimalType(10, 5), false)))
> val rowRDD = sc.parallelize(Array(Row(Decimal("67123.45"
> val df = sqlContext.createDataFrame(rowRDD, schema)
> df.registerTempTable("test")
> df.show()
> ​
> // ++
> // |name|
> // ++
> // |67123.45|
> // ++
> sqlContext.sql("create table testDecimal as select * from test")
> sqlContext.table("testDecimal").show()
> // ++
> // |name|
> // ++
> // |67.12345|
> // ++
> {code}
> The problem is when we do conversions, we do not use precision/scale info in 
> the schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8359) Spark SQL Decimal type precision loss on multiplication

2015-08-04 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653313#comment-14653313
 ] 

Apache Spark commented on SPARK-8359:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/7925

> Spark SQL Decimal type precision loss on multiplication
> ---
>
> Key: SPARK-8359
> URL: https://issues.apache.org/jira/browse/SPARK-8359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Rene Treffer
>Assignee: Davies Liu
>
> It looks like the precision of decimal can not be raised beyond ~2^112 
> without causing full value truncation.
> The following code computes the power of two up to a specific point
> {code}
> import org.apache.spark.sql.types.Decimal
> val one = Decimal(1)
> val two = Decimal(2)
> def pow(n : Int) :  Decimal = if (n <= 0) { one } else { 
>   val a = pow(n - 1)
>   a.changePrecision(n,0)
>   two.changePrecision(n,0)
>   a * two
> }
> (109 to 120).foreach(n => 
> println(pow(n).toJavaBigDecimal.unscaledValue.toString))
> 649037107316853453566312041152512
> 1298074214633706907132624082305024
> 2596148429267413814265248164610048
> 5192296858534827628530496329220096
> 1038459371706965525706099265844019
> 2076918743413931051412198531688038
> 4153837486827862102824397063376076
> 8307674973655724205648794126752152
> 1661534994731144841129758825350430
> 3323069989462289682259517650700860
> 6646139978924579364519035301401720
> 1329227995784915872903807060280344
> {code}
> Beyond ~2^112 the precision is truncated even if the precision was set to n 
> and should thus handle 10^n without problems..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9119) In some cases, we may save wrong decimal values to parquet

2015-08-04 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653312#comment-14653312
 ] 

Apache Spark commented on SPARK-9119:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/7925

> In some cases, we may save wrong decimal values to parquet
> --
>
> Key: SPARK-9119
> URL: https://issues.apache.org/jira/browse/SPARK-9119
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Davies Liu
>Priority: Blocker
>
> {code}
> > 
> import org.apache.spark.sql.Row
> import 
> org.apache.spark.sql.types.{StructType,StructField,StringType,DecimalType}
> import org.apache.spark.sql.types.Decimal
> ​
> val schema = StructType(Array(StructField("name", DecimalType(10, 5), false)))
> val rowRDD = sc.parallelize(Array(Row(Decimal("67123.45"
> val df = sqlContext.createDataFrame(rowRDD, schema)
> df.registerTempTable("test")
> df.show()
> ​
> // ++
> // |name|
> // ++
> // |67123.45|
> // ++
> sqlContext.sql("create table testDecimal as select * from test")
> sqlContext.table("testDecimal").show()
> // ++
> // |name|
> // ++
> // |67.12345|
> // ++
> {code}
> The problem is when we do conversions, we do not use precision/scale info in 
> the schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9119) In some cases, we may save wrong decimal values to parquet

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9119:
---

Assignee: Apache Spark  (was: Davies Liu)

> In some cases, we may save wrong decimal values to parquet
> --
>
> Key: SPARK-9119
> URL: https://issues.apache.org/jira/browse/SPARK-9119
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Apache Spark
>Priority: Blocker
>
> {code}
> > 
> import org.apache.spark.sql.Row
> import 
> org.apache.spark.sql.types.{StructType,StructField,StringType,DecimalType}
> import org.apache.spark.sql.types.Decimal
> ​
> val schema = StructType(Array(StructField("name", DecimalType(10, 5), false)))
> val rowRDD = sc.parallelize(Array(Row(Decimal("67123.45"
> val df = sqlContext.createDataFrame(rowRDD, schema)
> df.registerTempTable("test")
> df.show()
> ​
> // ++
> // |name|
> // ++
> // |67123.45|
> // ++
> sqlContext.sql("create table testDecimal as select * from test")
> sqlContext.table("testDecimal").show()
> // ++
> // |name|
> // ++
> // |67.12345|
> // ++
> {code}
> The problem is when we do conversions, we do not use precision/scale info in 
> the schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9591) Job failed for exception during getting Broadcast variable

2015-08-04 Thread jeanlyn (JIRA)
jeanlyn created SPARK-9591:
--

 Summary: Job failed for exception during getting Broadcast variable
 Key: SPARK-9591
 URL: https://issues.apache.org/jira/browse/SPARK-9591
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.1, 1.4.0, 1.3.1
Reporter: jeanlyn


Job might failed for exception throw when  we  getting the broadcast variable 
especially using dynamic resource allocate.
driver log
{noformat}
2015-07-21 05:36:31 INFO 15/07/21 05:36:31 WARN TaskSetManager: Lost task 496.1 
in stage 19.0 (TID 1715, XX): java.io.IOException: Failed to connect to 
X:27072
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused: xx:27072
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
... 1 more
15/07/21 05:36:32 WARN TaskSetManager: Lost task 496.2 in stage 19.0 (TID 1744, 
x): java.io.IOException: Failed to connect to /:34070
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused: xxx:34070
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
... 1 more

org.apache.spark.SparkException: Job aborted due to stage failure: Task 496 in 
stage 19.0 failed 4 times
{noformat}

executor log
{noformat}
15/07/21 05:36:17 ERROR shuffle.RetryingBlockFetcher: Exception while beginning 
fetch of 1 outstanding blocks
java.io.IOException: Failed to connect to xxx
at 
org.apache.spark.network.client.TransportClientFactory.createClient(Tr

[jira] [Commented] (SPARK-6591) Python data source load options should auto convert common types into strings

2015-08-04 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653352#comment-14653352
 ] 

Apache Spark commented on SPARK-6591:
-

User 'yjshen' has created a pull request for this issue:
https://github.com/apache/spark/pull/7926

> Python data source load options should auto convert common types into strings
> -
>
> Key: SPARK-6591
> URL: https://issues.apache.org/jira/browse/SPARK-6591
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Reporter: Reynold Xin
>Assignee: Yijie Shen
>  Labels: DataFrame, DataSource
>
> See the discussion at : https://github.com/databricks/spark-csv/pull/39
> If the caller invokes
> {code}
> sqlContext.load("com.databricks.spark.csv", path = "cars.csv", header = True)
> {code}
> We should automatically turn header into "true" in string form.
> We should do this for booleans and numeric values.
> cc [~yhuai]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-6591) Python data source load options should auto convert common types into strings

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6591:
---

Assignee: Apache Spark  (was: Yijie Shen)

> Python data source load options should auto convert common types into strings
> -
>
> Key: SPARK-6591
> URL: https://issues.apache.org/jira/browse/SPARK-6591
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>  Labels: DataFrame, DataSource
>
> See the discussion at : https://github.com/databricks/spark-csv/pull/39
> If the caller invokes
> {code}
> sqlContext.load("com.databricks.spark.csv", path = "cars.csv", header = True)
> {code}
> We should automatically turn header into "true" in string form.
> We should do this for booleans and numeric values.
> cc [~yhuai]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-6591) Python data source load options should auto convert common types into strings

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6591:
---

Assignee: Yijie Shen  (was: Apache Spark)

> Python data source load options should auto convert common types into strings
> -
>
> Key: SPARK-6591
> URL: https://issues.apache.org/jira/browse/SPARK-6591
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Reporter: Reynold Xin
>Assignee: Yijie Shen
>  Labels: DataFrame, DataSource
>
> See the discussion at : https://github.com/databricks/spark-csv/pull/39
> If the caller invokes
> {code}
> sqlContext.load("com.databricks.spark.csv", path = "cars.csv", header = True)
> {code}
> We should automatically turn header into "true" in string form.
> We should do this for booleans and numeric values.
> cc [~yhuai]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9592) First and Last aggregates are calculating the values for entire DataFrame partition not on GroupedData partition.

2015-08-04 Thread gaurav (JIRA)
gaurav created SPARK-9592:
-

 Summary: First and Last aggregates are calculating the values for 
entire DataFrame partition not on GroupedData partition.
 Key: SPARK-9592
 URL: https://issues.apache.org/jira/browse/SPARK-9592
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0, 1.5.0
Reporter: gaurav
Priority: Minor
 Fix For: 1.5.0


In current implementation, First and Last aggregates were calculating the 
values for entire DataFrame partition and then the same value was returned for 
all GroupedData in the partition.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
Fixed the First and Last aggregates should compute first and last value per 
GroupedData instead of entire DataFrame.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9591) Job failed for exception during getting Broadcast variable

2015-08-04 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653362#comment-14653362
 ] 

Apache Spark commented on SPARK-9591:
-

User 'jeanlyn' has created a pull request for this issue:
https://github.com/apache/spark/pull/7927

> Job failed for exception during getting Broadcast variable
> --
>
> Key: SPARK-9591
> URL: https://issues.apache.org/jira/browse/SPARK-9591
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1, 1.4.0, 1.4.1
>Reporter: jeanlyn
>
> Job might failed for exception throw when  we  getting the broadcast variable 
> especially using dynamic resource allocate.
> driver log
> {noformat}
> 2015-07-21 05:36:31 INFO 15/07/21 05:36:31 WARN TaskSetManager: Lost task 
> 496.1 in stage 19.0 (TID 1715, XX): java.io.IOException: Failed to 
> connect to X:27072
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
> at 
> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.ConnectException: Connection refused: xx:27072
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at 
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
> at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
> at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
> ... 1 more
> 15/07/21 05:36:32 WARN TaskSetManager: Lost task 496.2 in stage 19.0 (TID 
> 1744, x): java.io.IOException: Failed to connect to /:34070
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
> at 
> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.ConnectException: Connection refused: xxx:34070
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at 
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
> at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
> at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop

[jira] [Commented] (SPARK-9578) Stemmer feature transformer

2015-08-04 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653360#comment-14653360
 ] 

yuhao yang commented on SPARK-9578:
---

Share a great link [http://stackoverflow.com/a/11210358 ] for comparisons of 
different algorithms. 


> Stemmer feature transformer
> ---
>
> Key: SPARK-9578
> URL: https://issues.apache.org/jira/browse/SPARK-9578
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> Transformer mentioned first in [SPARK-5571] based on suggestion from 
> [~aloknsingh].  Very standard NLP preprocessing task.
> From [~aloknsingh]:
> {quote}
> We have one scala stemmer in scalanlp%chalk 
> https://github.com/scalanlp/chalk/tree/master/src/main/scala/chalk/text/analyze
>   which can easily copied (as it is apache project) and is in scala too.
> I think this will be better alternative than lucene englishAnalyzer or 
> opennlp.
> Note: we already use the scalanlp%breeze via the maven dependency so I think 
> adding scalanlp%chalk dependency is also the options. But as you had said we 
> can copy the code as it is small.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9591) Job failed for exception during getting Broadcast variable

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9591:
---

Assignee: Apache Spark

> Job failed for exception during getting Broadcast variable
> --
>
> Key: SPARK-9591
> URL: https://issues.apache.org/jira/browse/SPARK-9591
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1, 1.4.0, 1.4.1
>Reporter: jeanlyn
>Assignee: Apache Spark
>
> Job might failed for exception throw when  we  getting the broadcast variable 
> especially using dynamic resource allocate.
> driver log
> {noformat}
> 2015-07-21 05:36:31 INFO 15/07/21 05:36:31 WARN TaskSetManager: Lost task 
> 496.1 in stage 19.0 (TID 1715, XX): java.io.IOException: Failed to 
> connect to X:27072
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
> at 
> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.ConnectException: Connection refused: xx:27072
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at 
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
> at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
> at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
> ... 1 more
> 15/07/21 05:36:32 WARN TaskSetManager: Lost task 496.2 in stage 19.0 (TID 
> 1744, x): java.io.IOException: Failed to connect to /:34070
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
> at 
> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.ConnectException: Connection refused: xxx:34070
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at 
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
> at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
> at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThread

[jira] [Assigned] (SPARK-9591) Job failed for exception during getting Broadcast variable

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9591:
---

Assignee: (was: Apache Spark)

> Job failed for exception during getting Broadcast variable
> --
>
> Key: SPARK-9591
> URL: https://issues.apache.org/jira/browse/SPARK-9591
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1, 1.4.0, 1.4.1
>Reporter: jeanlyn
>
> Job might failed for exception throw when  we  getting the broadcast variable 
> especially using dynamic resource allocate.
> driver log
> {noformat}
> 2015-07-21 05:36:31 INFO 15/07/21 05:36:31 WARN TaskSetManager: Lost task 
> 496.1 in stage 19.0 (TID 1715, XX): java.io.IOException: Failed to 
> connect to X:27072
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
> at 
> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.ConnectException: Connection refused: xx:27072
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at 
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
> at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
> at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
> ... 1 more
> 15/07/21 05:36:32 WARN TaskSetManager: Lost task 496.2 in stage 19.0 (TID 
> 1744, x): java.io.IOException: Failed to connect to /:34070
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
> at 
> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
> at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.ConnectException: Connection refused: xxx:34070
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at 
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
> at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
> at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>

[jira] [Commented] (SPARK-5774) Support save RDD append to file

2015-08-04 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653374#comment-14653374
 ] 

Sean Owen commented on SPARK-5774:
--

Appending is not even necessarily possible in the underlying HDFS store.
See the discussion above for why this is generally not compatible with Spark's 
semantics anyway.


> Support save RDD append to file
> ---
>
> Key: SPARK-5774
> URL: https://issues.apache.org/jira/browse/SPARK-5774
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.3.0
>Reporter: Yanbo Liang
>
> Now RDD.saveAsTextFile only support writing to a file which is empty. In some 
> cases, we need to save RDD append to an existing file. For example, when 
> execute SQL command "INSERT INTO ...", we need to append the RDD to an 
> existing file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9593) Hive ShimLoader loads wrong Hadoop shims when Spark is compiled against Hadoop 2.0.0-mr1-cdh4.1.1

2015-08-04 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-9593:
-

 Summary: Hive ShimLoader loads wrong Hadoop shims when Spark is 
compiled against Hadoop 2.0.0-mr1-cdh4.1.1
 Key: SPARK-9593
 URL: https://issues.apache.org/jira/browse/SPARK-9593
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Blocker


Internally, Hive {{ShimLoader}} tries to load different versions of Hadoop 
shims by checking version information gathered from Hadoop jar files.  If the 
major version number is 1, {{Hadoop20SShims}} will be loaded.  Otherwise, if 
the major version number is 2, {{Hadoop23Shims}} will be chosen.  However, CDH 
Hadoop versions like 2.0.0-mr1-cdh4.1.1 have 2 as major version number, but 
contain Hadoop 1 code.  This confuses Hive {{ShimLoader}} and loads wrong 
version of shims.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9593) Hive ShimLoader loads wrong Hadoop shims when Spark is compiled against Hadoop 2.0.0-mr1-cdh4.1.1

2015-08-04 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653401#comment-14653401
 ] 

Apache Spark commented on SPARK-9593:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/7929

> Hive ShimLoader loads wrong Hadoop shims when Spark is compiled against 
> Hadoop 2.0.0-mr1-cdh4.1.1
> -
>
> Key: SPARK-9593
> URL: https://issues.apache.org/jira/browse/SPARK-9593
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>Priority: Blocker
>
> Internally, Hive {{ShimLoader}} tries to load different versions of Hadoop 
> shims by checking version information gathered from Hadoop jar files.  If the 
> major version number is 1, {{Hadoop20SShims}} will be loaded.  Otherwise, if 
> the major version number is 2, {{Hadoop23Shims}} will be chosen.  However, 
> CDH Hadoop versions like 2.0.0-mr1-cdh4.1.1 have 2 as major version number, 
> but contain Hadoop 1 code.  This confuses Hive {{ShimLoader}} and loads wrong 
> version of shims.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9593) Hive ShimLoader loads wrong Hadoop shims when Spark is compiled against Hadoop 2.0.0-mr1-cdh4.1.1

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9593:
---

Assignee: Cheng Lian  (was: Apache Spark)

> Hive ShimLoader loads wrong Hadoop shims when Spark is compiled against 
> Hadoop 2.0.0-mr1-cdh4.1.1
> -
>
> Key: SPARK-9593
> URL: https://issues.apache.org/jira/browse/SPARK-9593
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>Priority: Blocker
>
> Internally, Hive {{ShimLoader}} tries to load different versions of Hadoop 
> shims by checking version information gathered from Hadoop jar files.  If the 
> major version number is 1, {{Hadoop20SShims}} will be loaded.  Otherwise, if 
> the major version number is 2, {{Hadoop23Shims}} will be chosen.  However, 
> CDH Hadoop versions like 2.0.0-mr1-cdh4.1.1 have 2 as major version number, 
> but contain Hadoop 1 code.  This confuses Hive {{ShimLoader}} and loads wrong 
> version of shims.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9593) Hive ShimLoader loads wrong Hadoop shims when Spark is compiled against Hadoop 2.0.0-mr1-cdh4.1.1

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9593:
---

Assignee: Apache Spark  (was: Cheng Lian)

> Hive ShimLoader loads wrong Hadoop shims when Spark is compiled against 
> Hadoop 2.0.0-mr1-cdh4.1.1
> -
>
> Key: SPARK-9593
> URL: https://issues.apache.org/jira/browse/SPARK-9593
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Cheng Lian
>Assignee: Apache Spark
>Priority: Blocker
>
> Internally, Hive {{ShimLoader}} tries to load different versions of Hadoop 
> shims by checking version information gathered from Hadoop jar files.  If the 
> major version number is 1, {{Hadoop20SShims}} will be loaded.  Otherwise, if 
> the major version number is 2, {{Hadoop23Shims}} will be chosen.  However, 
> CDH Hadoop versions like 2.0.0-mr1-cdh4.1.1 have 2 as major version number, 
> but contain Hadoop 1 code.  This confuses Hive {{ShimLoader}} and loads wrong 
> version of shims.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9533) Add missing methods in Word2Vec ML (Python API)

2015-08-04 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653403#comment-14653403
 ] 

Apache Spark commented on SPARK-9533:
-

User 'MechCoder' has created a pull request for this issue:
https://github.com/apache/spark/pull/7930

> Add missing methods in Word2Vec ML (Python API)
> ---
>
> Key: SPARK-9533
> URL: https://issues.apache.org/jira/browse/SPARK-9533
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Manoj Kumar
>Priority: Minor
>
> After 8874 is resolved, we can add python wrappers for the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9533) Add missing methods in Word2Vec ML (Python API)

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9533:
---

Assignee: (was: Apache Spark)

> Add missing methods in Word2Vec ML (Python API)
> ---
>
> Key: SPARK-9533
> URL: https://issues.apache.org/jira/browse/SPARK-9533
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Manoj Kumar
>Priority: Minor
>
> After 8874 is resolved, we can add python wrappers for the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9533) Add missing methods in Word2Vec ML (Python API)

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9533:
---

Assignee: Apache Spark

> Add missing methods in Word2Vec ML (Python API)
> ---
>
> Key: SPARK-9533
> URL: https://issues.apache.org/jira/browse/SPARK-9533
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Manoj Kumar
>Assignee: Apache Spark
>Priority: Minor
>
> After 8874 is resolved, we can add python wrappers for the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9594) Failed to get broadcast_33_piece0 while using Accumulators in UDF

2015-08-04 Thread Poorvi Lashkary (JIRA)
Poorvi Lashkary created SPARK-9594:
--

 Summary:  Failed to get broadcast_33_piece0 while using 
Accumulators in UDF
 Key: SPARK-9594
 URL: https://issues.apache.org/jira/browse/SPARK-9594
 Project: Spark
  Issue Type: Test
 Environment: Amazon Linux AMI release 2014.09
Reporter: Poorvi Lashkary


Getting Below Exception while using accumulator in a UDF.

 java.io.IOException: org.apache.spark.SparkException: Failed to get 
broadcast_33_piece0 of broadcast_33
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1156)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Failed to get broadcast_33_piece0 
of broadcast_33
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at scala.Option.getOrElse(Option.scala:120)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1153)
... 11 more




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9593) Hive ShimLoader loads wrong Hadoop shims when Spark is compiled against Hadoop 2.0.0-mr1-cdh4.1.1

2015-08-04 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653419#comment-14653419
 ] 

Sean Owen commented on SPARK-9593:
--

This isn't quite right: the method that the shim loader is looking for was 
added in Hadoop 2.1, not 2.0. The CDH release is technically fine in this 
regard. I'll post more on the PR.

> Hive ShimLoader loads wrong Hadoop shims when Spark is compiled against 
> Hadoop 2.0.0-mr1-cdh4.1.1
> -
>
> Key: SPARK-9593
> URL: https://issues.apache.org/jira/browse/SPARK-9593
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>Priority: Blocker
>
> Internally, Hive {{ShimLoader}} tries to load different versions of Hadoop 
> shims by checking version information gathered from Hadoop jar files.  If the 
> major version number is 1, {{Hadoop20SShims}} will be loaded.  Otherwise, if 
> the major version number is 2, {{Hadoop23Shims}} will be chosen.  However, 
> CDH Hadoop versions like 2.0.0-mr1-cdh4.1.1 have 2 as major version number, 
> but contain Hadoop 1 code.  This confuses Hive {{ShimLoader}} and loads wrong 
> version of shims.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9594) Failed to get broadcast_33_piece0 while using Accumulators in UDF

2015-08-04 Thread Poorvi Lashkary (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653440#comment-14653440
 ] 

Poorvi Lashkary commented on SPARK-9594:


Use case: I need to create auto increment sequence column for a data frame. I 
have created a UDF which updates value of accumulator and returns that value. 
Used accumulators so that executors can share the same.
PFB sample code snippet:
static Accumulator start  = sc.accumulator(0);
SQLContext.udf().register("seq",new UDF1(){
public Integer call(Integer l) throws 
Exception{
l = startCtr.value() + 1;
startCtr.setValue(l);
return l;
}
},DataTypes.IntegerType);
System.out.println("recors:"+seqCount);
Query "Select seq("+start.value()+") as ID from df";

>  Failed to get broadcast_33_piece0 while using Accumulators in UDF
> --
>
> Key: SPARK-9594
> URL: https://issues.apache.org/jira/browse/SPARK-9594
> Project: Spark
>  Issue Type: Test
> Environment: Amazon Linux AMI release 2014.09
>Reporter: Poorvi Lashkary
>
> Getting Below Exception while using accumulator in a UDF.
>  java.io.IOException: org.apache.spark.SparkException: Failed to get 
> broadcast_33_piece0 of broadcast_33
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1156)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
> at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Failed to get broadcast_33_piece0 
> of broadcast_33
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at scala.Option.getOrElse(Option.scala:120)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1153)
> ... 11 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9594) Failed to get broadcast_33_piece0 while using Accumulators in UDF

2015-08-04 Thread Poorvi Lashkary (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653440#comment-14653440
 ] 

Poorvi Lashkary edited comment on SPARK-9594 at 8/4/15 10:41 AM:
-

Use case: I need to create auto increment sequence column for a data frame. I 
have created a UDF which updates value of accumulator and returns that value. 
Used accumulators so that executors can share the same.
PFB sample code snippet:
static Accumulator start  = sc.accumulator(0);
SQLContext.udf().register("seq",new UDF1(){
public Integer call(Integer l) throws 
Exception{
l = startCtr.value() + 1;
startCtr.setValue(l);
return l;
}
},DataTypes.IntegerType);

Query "Select seq("+start.value()+") as ID from df";


was (Author: poorvi_767):
Use case: I need to create auto increment sequence column for a data frame. I 
have created a UDF which updates value of accumulator and returns that value. 
Used accumulators so that executors can share the same.
PFB sample code snippet:
static Accumulator start  = sc.accumulator(0);
SQLContext.udf().register("seq",new UDF1(){
public Integer call(Integer l) throws 
Exception{
l = startCtr.value() + 1;
startCtr.setValue(l);
return l;
}
},DataTypes.IntegerType);
System.out.println("recors:"+seqCount);
Query "Select seq("+start.value()+") as ID from df";

>  Failed to get broadcast_33_piece0 while using Accumulators in UDF
> --
>
> Key: SPARK-9594
> URL: https://issues.apache.org/jira/browse/SPARK-9594
> Project: Spark
>  Issue Type: Test
> Environment: Amazon Linux AMI release 2014.09
>Reporter: Poorvi Lashkary
>
> Getting Below Exception while using accumulator in a UDF.
>  java.io.IOException: org.apache.spark.SparkException: Failed to get 
> broadcast_33_piece0 of broadcast_33
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1156)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
> at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Failed to get broadcast_33_piece0 
> of broadcast_33
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at scala.Option.getOrElse(Option.scala:120)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
> at org.apache.spark.util.Utils$.tryOrIOExceptio

[jira] [Comment Edited] (SPARK-9594) Failed to get broadcast_33_piece0 while using Accumulators in UDF

2015-08-04 Thread Poorvi Lashkary (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653440#comment-14653440
 ] 

Poorvi Lashkary edited comment on SPARK-9594 at 8/4/15 10:42 AM:
-

Use case: I need to create auto increment sequence column for a data frame. I 
have created a UDF which updates value of accumulator and returns that value. 
Used accumulators so that executors can share the same.
PFB sample code snippet:
static Accumulator start  = sc.accumulator(0);
SQLContext.udf().register("seq",new UDF1(){
public Integer call(Integer l) throws 
Exception{
l = startCtr.value() + 1;
start.setValue(l);
return l;
}
},DataTypes.IntegerType);

Query "Select seq("+start.value()+") as ID from df";


was (Author: poorvi_767):
Use case: I need to create auto increment sequence column for a data frame. I 
have created a UDF which updates value of accumulator and returns that value. 
Used accumulators so that executors can share the same.
PFB sample code snippet:
static Accumulator start  = sc.accumulator(0);
SQLContext.udf().register("seq",new UDF1(){
public Integer call(Integer l) throws 
Exception{
l = startCtr.value() + 1;
startCtr.setValue(l);
return l;
}
},DataTypes.IntegerType);

Query "Select seq("+start.value()+") as ID from df";

>  Failed to get broadcast_33_piece0 while using Accumulators in UDF
> --
>
> Key: SPARK-9594
> URL: https://issues.apache.org/jira/browse/SPARK-9594
> Project: Spark
>  Issue Type: Test
> Environment: Amazon Linux AMI release 2014.09
>Reporter: Poorvi Lashkary
>
> Getting Below Exception while using accumulator in a UDF.
>  java.io.IOException: org.apache.spark.SparkException: Failed to get 
> broadcast_33_piece0 of broadcast_33
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1156)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
> at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Failed to get broadcast_33_piece0 
> of broadcast_33
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at scala.Option.getOrElse(Option.scala:120)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1153)
> ... 11 more



-

[jira] [Comment Edited] (SPARK-9594) Failed to get broadcast_33_piece0 while using Accumulators in UDF

2015-08-04 Thread Poorvi Lashkary (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653440#comment-14653440
 ] 

Poorvi Lashkary edited comment on SPARK-9594 at 8/4/15 10:42 AM:
-

Use case: I need to create auto increment sequence column for a data frame. I 
have created a UDF which updates value of accumulator and returns that value. 
Used accumulators so that executors can share the same.
PFB sample code snippet:
static Accumulator start  = sc.accumulator(0);
SQLContext.udf().register("seq",new UDF1(){
public Integer call(Integer l) throws 
Exception{
l = start.value() + 1;
start.setValue(l);
return l;
}
},DataTypes.IntegerType);

Query "Select seq("+start.value()+") as ID from df";


was (Author: poorvi_767):
Use case: I need to create auto increment sequence column for a data frame. I 
have created a UDF which updates value of accumulator and returns that value. 
Used accumulators so that executors can share the same.
PFB sample code snippet:
static Accumulator start  = sc.accumulator(0);
SQLContext.udf().register("seq",new UDF1(){
public Integer call(Integer l) throws 
Exception{
l = startCtr.value() + 1;
start.setValue(l);
return l;
}
},DataTypes.IntegerType);

Query "Select seq("+start.value()+") as ID from df";

>  Failed to get broadcast_33_piece0 while using Accumulators in UDF
> --
>
> Key: SPARK-9594
> URL: https://issues.apache.org/jira/browse/SPARK-9594
> Project: Spark
>  Issue Type: Test
> Environment: Amazon Linux AMI release 2014.09
>Reporter: Poorvi Lashkary
>
> Getting Below Exception while using accumulator in a UDF.
>  java.io.IOException: org.apache.spark.SparkException: Failed to get 
> broadcast_33_piece0 of broadcast_33
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1156)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
> at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Failed to get broadcast_33_piece0 
> of broadcast_33
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at scala.Option.getOrElse(Option.scala:120)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1153)
> ... 11 more



--
This

[jira] [Updated] (SPARK-9359) Support IntervalType for Parquet

2015-08-04 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-9359:
--
Assignee: Liang-Chi Hsieh

> Support IntervalType for Parquet
> 
>
> Key: SPARK-9359
> URL: https://issues.apache.org/jira/browse/SPARK-9359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Cheng Lian
>Assignee: Liang-Chi Hsieh
>
> SPARK-8753 introduced {{IntervalType}} which corresponds to Parquet 
> {{INTERVAL}} logical type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9359) Support IntervalType for Parquet

2015-08-04 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-9359:
--
Shepherd: Cheng Lian

> Support IntervalType for Parquet
> 
>
> Key: SPARK-9359
> URL: https://issues.apache.org/jira/browse/SPARK-9359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Cheng Lian
>Assignee: Liang-Chi Hsieh
>
> SPARK-8753 introduced {{IntervalType}} which corresponds to Parquet 
> {{INTERVAL}} logical type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9534) Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition

2015-08-04 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-9534.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7862
[https://github.com/apache/spark/pull/7862]

> Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 
> edition
> ---
>
> Key: SPARK-9534
> URL: https://issues.apache.org/jira/browse/SPARK-9534
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 1.5.0
>
>
> For parity with the kinds of warnings scalac emits, we should turn on some of 
> javac's lint options. This reports, for example use of deprecated APIs and 
> unchecked casts as scalac does.
> And it's a good time to sweep through build warnings and fix a bunch before 
> the release.
> PR coming which shows and explains the fixes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9592) First and Last aggregates are calculating the values for entire DataFrame partition not on GroupedData partition.

2015-08-04 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653502#comment-14653502
 ] 

Apache Spark commented on SPARK-9592:
-

User 'ggupta81' has created a pull request for this issue:
https://github.com/apache/spark/pull/7928

> First and Last aggregates are calculating the values for entire DataFrame 
> partition not on GroupedData partition.
> -
>
> Key: SPARK-9592
> URL: https://issues.apache.org/jira/browse/SPARK-9592
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0, 1.5.0
>Reporter: gaurav
>Priority: Minor
> Fix For: 1.5.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> In current implementation, First and Last aggregates were calculating the 
> values for entire DataFrame partition and then the same value was returned 
> for all GroupedData in the partition.
> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
> Fixed the First and Last aggregates should compute first and last value per 
> GroupedData instead of entire DataFrame.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9592) First and Last aggregates are calculating the values for entire DataFrame partition not on GroupedData partition.

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9592:
---

Assignee: (was: Apache Spark)

> First and Last aggregates are calculating the values for entire DataFrame 
> partition not on GroupedData partition.
> -
>
> Key: SPARK-9592
> URL: https://issues.apache.org/jira/browse/SPARK-9592
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0, 1.5.0
>Reporter: gaurav
>Priority: Minor
> Fix For: 1.5.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> In current implementation, First and Last aggregates were calculating the 
> values for entire DataFrame partition and then the same value was returned 
> for all GroupedData in the partition.
> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
> Fixed the First and Last aggregates should compute first and last value per 
> GroupedData instead of entire DataFrame.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9592) First and Last aggregates are calculating the values for entire DataFrame partition not on GroupedData partition.

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9592:
---

Assignee: Apache Spark

> First and Last aggregates are calculating the values for entire DataFrame 
> partition not on GroupedData partition.
> -
>
> Key: SPARK-9592
> URL: https://issues.apache.org/jira/browse/SPARK-9592
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0, 1.5.0
>Reporter: gaurav
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 1.5.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> In current implementation, First and Last aggregates were calculating the 
> values for entire DataFrame partition and then the same value was returned 
> for all GroupedData in the partition.
> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
> Fixed the First and Last aggregates should compute first and last value per 
> GroupedData instead of entire DataFrame.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9592) First and Last aggregates are calculating the values for entire DataFrame partition not on GroupedData partition.

2015-08-04 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9592:
-
Affects Version/s: (was: 1.5.0)
 Target Version/s:   (was: 1.4.0)
Fix Version/s: (was: 1.5.0)

[~ggupta81] don't set Fix/Target Version
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

> First and Last aggregates are calculating the values for entire DataFrame 
> partition not on GroupedData partition.
> -
>
> Key: SPARK-9592
> URL: https://issues.apache.org/jira/browse/SPARK-9592
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
>Reporter: gaurav
>Priority: Minor
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> In current implementation, First and Last aggregates were calculating the 
> values for entire DataFrame partition and then the same value was returned 
> for all GroupedData in the partition.
> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
> Fixed the First and Last aggregates should compute first and last value per 
> GroupedData instead of entire DataFrame.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8131) Improve Database support

2015-08-04 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8131:
-
Assignee: Cheng Lian

> Improve Database support
> 
>
> Key: SPARK-8131
> URL: https://issues.apache.org/jira/browse/SPARK-8131
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Cheng Lian
>Priority: Critical
> Fix For: 1.5.0
>
>
> This is the master jira for tracking the improvement on database support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9255) SQL codegen fails with "value < is not a member of TimestampType.this.InternalType"

2015-08-04 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9255:
-
Assignee: Reynold Xin

> SQL codegen fails with "value < is not a member of 
> TimestampType.this.InternalType"
> ---
>
> Key: SPARK-9255
> URL: https://issues.apache.org/jira/browse/SPARK-9255
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
> Environment: Redhat Linux, Java 8.0 and Spark 1.4.1 release.
>Reporter: Paul Wu
>Assignee: Reynold Xin
> Fix For: 1.5.0
>
> Attachments: timestamp_bug2.zip, tstest
>
>
> Updates: This issue is due to the following config: 
> spark.sql.codegen   true
> If this param is set to be false, the problem does not happen. The bug was 
> introduced in 1.4.0.  Releases 1.3.0 and 1.3.1 have no this issue.
> ===
> This is a very strange case involving timestamp  I can run the program on 
> Windows using dev pom.xml (1.4.1) or 1.4.1 or 1.3.0 release downloaded from 
> Apache  without issues , but when I ran it on Spark 1.4.1 release either 
> downloaded from Apache or the version built with scala 2.11 on redhat linux, 
> it has the following error (the code I used is after this stack trace):
> 15/07/22 12:02:50  ERROR Executor 96: Exception in task 0.0 in stage 0.0 (TID 
> 0)
> java.util.concurrent.ExecutionException: scala.tools.reflect.ToolBoxError: 
> reflective compilation has failed:
> value < is not a member of TimestampType.this.InternalType
> at 
> org.spark-project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
> at 
> org.spark-project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
> at 
> org.spark-project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
> at 
> org.spark-project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
> at 
> org.spark-project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
> at 
> org.spark-project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
> at 
> org.spark-project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
> at 
> org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
> at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000)
> at 
> org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
> at 
> org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:105)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:102)
> at 
> org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:170)
> at 
> org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:261)
> at 
> org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$9.apply(GeneratedAggregate.scala:246)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: scala.tools.reflect.ToolBoxError: reflective compilation has 
> failed:
> value < is not a member of TimestampType.this.InternalType
> at 
> scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.throwIfErrors(ToolBoxFactory.scala:316)
> at 
> scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.wrapInPackageAndCom

[jira] [Updated] (SPARK-9594) Failed to get broadcast_33_piece0 while using Accumulators in UDF

2015-08-04 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9594:
-
Target Version/s:   (was: 1.3.1)
Priority: Minor  (was: Major)
 Component/s: SQL

[~poorvi_767] have a look at 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark  Some 
of the JIRA fields aren't set correctly

>  Failed to get broadcast_33_piece0 while using Accumulators in UDF
> --
>
> Key: SPARK-9594
> URL: https://issues.apache.org/jira/browse/SPARK-9594
> Project: Spark
>  Issue Type: Test
>  Components: SQL
> Environment: Amazon Linux AMI release 2014.09
>Reporter: Poorvi Lashkary
>Priority: Minor
>
> Getting Below Exception while using accumulator in a UDF.
>  java.io.IOException: org.apache.spark.SparkException: Failed to get 
> broadcast_33_piece0 of broadcast_33
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1156)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
> at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Failed to get broadcast_33_piece0 
> of broadcast_33
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at scala.Option.getOrElse(Option.scala:120)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1153)
> ... 11 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9574) Review the contents of uber JARs spark-streaming-XXX-assembly

2015-08-04 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9574:
-
Component/s: Streaming
 Issue Type: Task  (was: Bug)

> Review the contents of uber JARs spark-streaming-XXX-assembly
> -
>
> Key: SPARK-9574
> URL: https://issues.apache.org/jira/browse/SPARK-9574
> Project: Spark
>  Issue Type: Task
>  Components: Streaming
>Reporter: Tathagata Das
>Assignee: Shixiong Zhu
>
> It should not contain Spark core and its dependencies, especially the 
> following.
> - Hadoop and its dependencies
> - Scala libraries



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9573) Forward exceptions in batch jobs to the awaitTermination thread

2015-08-04 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9573:
-
Component/s: Streaming

> Forward exceptions in batch jobs to the awaitTermination thread
> ---
>
> Key: SPARK-9573
> URL: https://issues.apache.org/jira/browse/SPARK-9573
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> Currently, if a batch fails with an exception, the failure is only logged in 
> the background and the system moves on. This is okay in many usecases, but in 
> some usecases (especially when you are debugging), the users wants to see all 
> the failures. This JIRA is for failing the context when on a batch failure, 
> when a SparkConf flag (say, spark.streaming.failContextOnBatchFailure) is set 
> to true. Default should be false for Spark 1.5.0. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9588) spark sql cache: partition level cache eviction

2015-08-04 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9588:
-
Component/s: SQL

> spark sql cache: partition level cache eviction
> ---
>
> Key: SPARK-9588
> URL: https://issues.apache.org/jira/browse/SPARK-9588
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Shenghu Yang
>
> In spark 1.4, we can only do 'cache table '. However, if we have 
> table which will get a new partition periodically, say every 10 minutes, we 
> have to do 'uncache' & then 'cache' the whole table, taking long time.
> Things would be much faster if we can do:
> (1) cache table  partition 
> (2) uncache table  partition 
> This way we will alway have a sliding window type of cached data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9595) Adding API to SparkConf for kryo serializers registration

2015-08-04 Thread John Chen (JIRA)
John Chen created SPARK-9595:


 Summary: Adding API to SparkConf for kryo serializers registration
 Key: SPARK-9595
 URL: https://issues.apache.org/jira/browse/SPARK-9595
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.4.1, 1.3.1
Reporter: John Chen
Priority: Minor


Currently SparkConf has a registerKryoClasses API for kryo registration. 
However, this only works when you register classes. If you want to register 
customized kryo serializers, you'll have to extend the KryoSerializer class and 
write some codes.

This is not only very inconvenient, but also require the registration to be 
done in compile-time, which is not always possible. Thus, I suggest another API 
to SparkConf for registering customized kryo serializers. It could be like this:

def registerKryoSerializers(serializers: Map[Class[_], Serializer]): SparkConf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-9587) Spark Web UI not displaying while changing another network

2015-08-04 Thread Kaveen Raajan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaveen Raajan closed SPARK-9587.

Resolution: Not A Problem

This was working if I *set SPARK_LOCAL_HOSTNAME={COMPUTERNAME}*. This spark 
drivers and web address are refferring to hostname instead of IP.

> Spark Web UI not displaying while changing another network
> --
>
> Key: SPARK-9587
> URL: https://issues.apache.org/jira/browse/SPARK-9587
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.4.1
> Environment: Windows,
> Hadoop-2.5.2,
>Reporter: Kaveen Raajan
>
> I want to start my spark-shell with localhost instead of IP. I'm running 
> spark-shell in yarn-client mode. My Hadoop are running as singlenode cluster 
> connecting with localhost.
> I changed following property in spark-default.conf 
> {panel:title=spark-default.conf}
> spark.driver.hostlocalhost
> spark.driver.hosts   localhost
> {panel}
> Initially while starting spark-shell I'm connecting with some public network 
> (172.16.xxx.yyy) If I disconnect network mean Spark jobs are working without 
> any problem. But Spark web UI are not working.
> ApplicationMaster always connecting with current IP instead of localhost.
> My log are here
> {code}
> 15/08/04 10:17:10 INFO spark.SecurityManager: Changing view acls to: SYSTEM
> 15/08/04 10:17:10 INFO spark.SecurityManager: Changing modify acls to: SYSTEM
> 15/08/04 10:17:10 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(SYSTEM); users 
> with modify permissions: Set(SYSTEM)
> 15/08/04 10:17:10 INFO spark.HttpServer: Starting HTTP Server
> 15/08/04 10:17:10 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/08/04 10:17:10 INFO server.AbstractConnector: Started 
> SocketConnector@0.0.0.0:58416
> 15/08/04 10:17:10 INFO util.Utils: Successfully started service 'HTTP class 
> server' on port 58416.
> 15/08/04 10:17:15 INFO spark.SparkContext: Running Spark version 1.4.0
> 15/08/04 10:17:15 INFO spark.SecurityManager: Changing view acls to: SYSTEM
> 15/08/04 10:17:15 INFO spark.SecurityManager: Changing modify acls to: SYSTEM
> 15/08/04 10:17:15 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(SYSTEM); users 
> with modify permissions: Set(SYSTEM)
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 1.4.0
>   /_/
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51)
> Type in expressions to have them evaluated.
> Type :help for more information.
> 15/08/04 10:17:15 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 15/08/04 10:17:15 INFO Remoting: Starting remoting
> 15/08/04 10:17:16 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sparkDriver@localhost:58439]
> 15/08/04 10:17:16 INFO util.Utils: Successfully started service 'sparkDriver' 
> on port 58439.
> 15/08/04 10:17:16 INFO spark.SparkEnv: Registering MapOutputTracker
> 15/08/04 10:17:16 INFO spark.SparkEnv: Registering BlockManagerMaster
> 15/08/04 10:17:16 INFO storage.DiskBlockManager: Created local directory at 
> C:\Windows\Temp\spark-86221988-7e8b-4340-be80-a2be283845e3\blockmgr-2c1b95de-936b-44f3-b98d-263c45e310ca
> 15/08/04 10:17:16 INFO storage.MemoryStore: MemoryStore started with capacity 
> 265.4 MB
> 15/08/04 10:17:16 INFO spark.HttpFileServer: HTTP File server directory is 
> C:\Windows\Temp\spark-86221988-7e8b-4340-be80-a2be283845e3\httpd-da7b686d-deb0-446d-af20-42ded6d6d035
> 15/08/04 10:17:16 INFO spark.HttpServer: Starting HTTP Server
> 15/08/04 10:17:16 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/08/04 10:17:16 INFO server.AbstractConnector: Started 
> SocketConnector@0.0.0.0:58440
> 15/08/04 10:17:16 INFO util.Utils: Successfully started service 'HTTP file 
> server' on port 58440.
> 15/08/04 10:17:16 INFO spark.SparkEnv: Registering OutputCommitCoordinator
> 15/08/04 10:17:16 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/08/04 10:17:16 INFO server.AbstractConnector: Started 
> SelectChannelConnector@0.0.0.0:4040
> 15/08/04 10:17:16 INFO util.Utils: Successfully started service 'SparkUI' on 
> port 4040.
> 15/08/04 10:17:16 INFO ui.SparkUI: Started SparkUI at 
> http://172.16.123.123:4040
> 15/08/04 10:17:16 INFO client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8032
> 15/08/04 10:17:17 INFO yarn.Client: Requesting a new application from cluster 
> with 1 NodeManagers
> 15/08/04 10:17:17 INFO yarn.Client: Verifying our application has not 
> requested more than the maximum memory capability of the cluster (2048 MB per 
> container)
> 15/08/04 10:17:17 INFO yarn.Client: Will allocate AM container, with 

[jira] [Created] (SPARK-9596) We better not load Hadoop classes again

2015-08-04 Thread Tao Wang (JIRA)
Tao Wang created SPARK-9596:
---

 Summary: We better not load Hadoop classes again
 Key: SPARK-9596
 URL: https://issues.apache.org/jira/browse/SPARK-9596
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Tao Wang


Some hadoop classes contains global information such as authentication in 
UserGroupInformation. If we load them again in `IsolatedClientLoader`, the 
message they carry will be dropped.

So we should treat hadoop classes as "shared" too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9587) Spark Web UI not displaying while changing another network

2015-08-04 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653573#comment-14653573
 ] 

Sean Owen commented on SPARK-9587:
--

OK, I think you may be asking the same question as in 
https://issues.apache.org/jira/browse/SPARK-8982 then

> Spark Web UI not displaying while changing another network
> --
>
> Key: SPARK-9587
> URL: https://issues.apache.org/jira/browse/SPARK-9587
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.4.1
> Environment: Windows,
> Hadoop-2.5.2,
>Reporter: Kaveen Raajan
>
> I want to start my spark-shell with localhost instead of IP. I'm running 
> spark-shell in yarn-client mode. My Hadoop are running as singlenode cluster 
> connecting with localhost.
> I changed following property in spark-default.conf 
> {panel:title=spark-default.conf}
> spark.driver.hostlocalhost
> spark.driver.hosts   localhost
> {panel}
> Initially while starting spark-shell I'm connecting with some public network 
> (172.16.xxx.yyy) If I disconnect network mean Spark jobs are working without 
> any problem. But Spark web UI are not working.
> ApplicationMaster always connecting with current IP instead of localhost.
> My log are here
> {code}
> 15/08/04 10:17:10 INFO spark.SecurityManager: Changing view acls to: SYSTEM
> 15/08/04 10:17:10 INFO spark.SecurityManager: Changing modify acls to: SYSTEM
> 15/08/04 10:17:10 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(SYSTEM); users 
> with modify permissions: Set(SYSTEM)
> 15/08/04 10:17:10 INFO spark.HttpServer: Starting HTTP Server
> 15/08/04 10:17:10 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/08/04 10:17:10 INFO server.AbstractConnector: Started 
> SocketConnector@0.0.0.0:58416
> 15/08/04 10:17:10 INFO util.Utils: Successfully started service 'HTTP class 
> server' on port 58416.
> 15/08/04 10:17:15 INFO spark.SparkContext: Running Spark version 1.4.0
> 15/08/04 10:17:15 INFO spark.SecurityManager: Changing view acls to: SYSTEM
> 15/08/04 10:17:15 INFO spark.SecurityManager: Changing modify acls to: SYSTEM
> 15/08/04 10:17:15 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(SYSTEM); users 
> with modify permissions: Set(SYSTEM)
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 1.4.0
>   /_/
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51)
> Type in expressions to have them evaluated.
> Type :help for more information.
> 15/08/04 10:17:15 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 15/08/04 10:17:15 INFO Remoting: Starting remoting
> 15/08/04 10:17:16 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sparkDriver@localhost:58439]
> 15/08/04 10:17:16 INFO util.Utils: Successfully started service 'sparkDriver' 
> on port 58439.
> 15/08/04 10:17:16 INFO spark.SparkEnv: Registering MapOutputTracker
> 15/08/04 10:17:16 INFO spark.SparkEnv: Registering BlockManagerMaster
> 15/08/04 10:17:16 INFO storage.DiskBlockManager: Created local directory at 
> C:\Windows\Temp\spark-86221988-7e8b-4340-be80-a2be283845e3\blockmgr-2c1b95de-936b-44f3-b98d-263c45e310ca
> 15/08/04 10:17:16 INFO storage.MemoryStore: MemoryStore started with capacity 
> 265.4 MB
> 15/08/04 10:17:16 INFO spark.HttpFileServer: HTTP File server directory is 
> C:\Windows\Temp\spark-86221988-7e8b-4340-be80-a2be283845e3\httpd-da7b686d-deb0-446d-af20-42ded6d6d035
> 15/08/04 10:17:16 INFO spark.HttpServer: Starting HTTP Server
> 15/08/04 10:17:16 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/08/04 10:17:16 INFO server.AbstractConnector: Started 
> SocketConnector@0.0.0.0:58440
> 15/08/04 10:17:16 INFO util.Utils: Successfully started service 'HTTP file 
> server' on port 58440.
> 15/08/04 10:17:16 INFO spark.SparkEnv: Registering OutputCommitCoordinator
> 15/08/04 10:17:16 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/08/04 10:17:16 INFO server.AbstractConnector: Started 
> SelectChannelConnector@0.0.0.0:4040
> 15/08/04 10:17:16 INFO util.Utils: Successfully started service 'SparkUI' on 
> port 4040.
> 15/08/04 10:17:16 INFO ui.SparkUI: Started SparkUI at 
> http://172.16.123.123:4040
> 15/08/04 10:17:16 INFO client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8032
> 15/08/04 10:17:17 INFO yarn.Client: Requesting a new application from cluster 
> with 1 NodeManagers
> 15/08/04 10:17:17 INFO yarn.Client: Verifying our application has not 
> requested more than the maximum memory capability of the cluster (2048 MB per 
> container)
> 15/08/04 10:17:17 INFO yarn.Client: Will allocate AM container, with 896 MB 
> mem

[jira] [Updated] (SPARK-9596) Avoid reloading Hadoop classes like UserGroupInformation

2015-08-04 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9596:
-
Summary: Avoid reloading Hadoop classes like UserGroupInformation  (was: We 
better not load Hadoop classes again)

> Avoid reloading Hadoop classes like UserGroupInformation
> 
>
> Key: SPARK-9596
> URL: https://issues.apache.org/jira/browse/SPARK-9596
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Tao Wang
>
> Some hadoop classes contains global information such as authentication in 
> UserGroupInformation. If we load them again in `IsolatedClientLoader`, the 
> message they carry will be dropped.
> So we should treat hadoop classes as "shared" too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9596) Avoid reloading Hadoop classes like UserGroupInformation

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9596:
---

Assignee: Apache Spark

> Avoid reloading Hadoop classes like UserGroupInformation
> 
>
> Key: SPARK-9596
> URL: https://issues.apache.org/jira/browse/SPARK-9596
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Tao Wang
>Assignee: Apache Spark
>
> Some hadoop classes contains global information such as authentication in 
> UserGroupInformation. If we load them again in `IsolatedClientLoader`, the 
> message they carry will be dropped.
> So we should treat hadoop classes as "shared" too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9596) Avoid reloading Hadoop classes like UserGroupInformation

2015-08-04 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653576#comment-14653576
 ] 

Apache Spark commented on SPARK-9596:
-

User 'WangTaoTheTonic' has created a pull request for this issue:
https://github.com/apache/spark/pull/7931

> Avoid reloading Hadoop classes like UserGroupInformation
> 
>
> Key: SPARK-9596
> URL: https://issues.apache.org/jira/browse/SPARK-9596
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Tao Wang
>
> Some hadoop classes contains global information such as authentication in 
> UserGroupInformation. If we load them again in `IsolatedClientLoader`, the 
> message they carry will be dropped.
> So we should treat hadoop classes as "shared" too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9596) Avoid reloading Hadoop classes like UserGroupInformation

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9596:
---

Assignee: (was: Apache Spark)

> Avoid reloading Hadoop classes like UserGroupInformation
> 
>
> Key: SPARK-9596
> URL: https://issues.apache.org/jira/browse/SPARK-9596
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Tao Wang
>
> Some hadoop classes contains global information such as authentication in 
> UserGroupInformation. If we load them again in `IsolatedClientLoader`, the 
> message they carry will be dropped.
> So we should treat hadoop classes as "shared" too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2016) rdd in-memory storage UI becomes unresponsive when the number of RDD partitions is large

2015-08-04 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-2016:
--
Assignee: Carson Wang

> rdd in-memory storage UI becomes unresponsive when the number of RDD 
> partitions is large
> 
>
> Key: SPARK-2016
> URL: https://issues.apache.org/jira/browse/SPARK-2016
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Reporter: Reynold Xin
>Assignee: Carson Wang
>  Labels: starter
>
> Try run
> {code}
> sc.parallelize(1 to 100, 100).cache().count()
> {code}
> And open the storage UI for this RDD. It takes forever to load the page.
> When the number of partitions is very large, I think there are a few 
> alternatives:
> 0. Only show the top 1000.
> 1. Pagination
> 2. Instead of grouping by RDD blocks, group by executors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2016) rdd in-memory storage UI becomes unresponsive when the number of RDD partitions is large

2015-08-04 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-2016:
--
Fix Version/s: 1.5.0

> rdd in-memory storage UI becomes unresponsive when the number of RDD 
> partitions is large
> 
>
> Key: SPARK-2016
> URL: https://issues.apache.org/jira/browse/SPARK-2016
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 1.5.0
>Reporter: Reynold Xin
>Assignee: Carson Wang
>  Labels: starter
> Fix For: 1.5.0
>
>
> Try run
> {code}
> sc.parallelize(1 to 100, 100).cache().count()
> {code}
> And open the storage UI for this RDD. It takes forever to load the page.
> When the number of partitions is very large, I think there are a few 
> alternatives:
> 0. Only show the top 1000.
> 1. Pagination
> 2. Instead of grouping by RDD blocks, group by executors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2016) rdd in-memory storage UI becomes unresponsive when the number of RDD partitions is large

2015-08-04 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-2016.
---
Resolution: Fixed

> rdd in-memory storage UI becomes unresponsive when the number of RDD 
> partitions is large
> 
>
> Key: SPARK-2016
> URL: https://issues.apache.org/jira/browse/SPARK-2016
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 1.5.0
>Reporter: Reynold Xin
>Assignee: Carson Wang
>  Labels: starter
> Fix For: 1.5.0
>
>
> Try run
> {code}
> sc.parallelize(1 to 100, 100).cache().count()
> {code}
> And open the storage UI for this RDD. It takes forever to load the page.
> When the number of partitions is very large, I think there are a few 
> alternatives:
> 0. Only show the top 1000.
> 1. Pagination
> 2. Instead of grouping by RDD blocks, group by executors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9597) Spark Streaming + MQTT Integration Guide

2015-08-04 Thread Prabeesh K (JIRA)
Prabeesh K created SPARK-9597:
-

 Summary: Spark Streaming + MQTT Integration Guide
 Key: SPARK-9597
 URL: https://issues.apache.org/jira/browse/SPARK-9597
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Reporter: Prabeesh K






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2016) rdd in-memory storage UI becomes unresponsive when the number of RDD partitions is large

2015-08-04 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-2016:
--
Affects Version/s: 1.5.0

> rdd in-memory storage UI becomes unresponsive when the number of RDD 
> partitions is large
> 
>
> Key: SPARK-2016
> URL: https://issues.apache.org/jira/browse/SPARK-2016
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 1.5.0
>Reporter: Reynold Xin
>Assignee: Carson Wang
>  Labels: starter
> Fix For: 1.5.0
>
>
> Try run
> {code}
> sc.parallelize(1 to 100, 100).cache().count()
> {code}
> And open the storage UI for this RDD. It takes forever to load the page.
> When the number of partitions is very large, I think there are a few 
> alternatives:
> 0. Only show the top 1000.
> 1. Pagination
> 2. Instead of grouping by RDD blocks, group by executors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9597) Spark Streaming + MQTT Integration Guide

2015-08-04 Thread Prabeesh K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabeesh K updated SPARK-9597:
--
Description: Add Spark Streaming + 

> Spark Streaming + MQTT Integration Guide
> 
>
> Key: SPARK-9597
> URL: https://issues.apache.org/jira/browse/SPARK-9597
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: Prabeesh K
>
> Add Spark Streaming + 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9597) Spark Streaming + MQTT Integration Guide

2015-08-04 Thread Prabeesh K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabeesh K updated SPARK-9597:
--
Description: 
Add Spark Streaming + MQTT Integration Guide like
[Spark Streaming + Flume Integration 
Guide|http://spark.apache.org/docs/latest/streaming-flume-integration.html]
[Spark Streaming + Kinesis 
Integration|http://spark.apache.org/docs/latest/streaming-kinesis-integration.html]
[Spark Streaming + Kafka Integration 
Guide|http://spark.apache.org/docs/latest/streaming-kafka-integration.html]


  was:Add Spark Streaming + 


> Spark Streaming + MQTT Integration Guide
> 
>
> Key: SPARK-9597
> URL: https://issues.apache.org/jira/browse/SPARK-9597
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: Prabeesh K
>
> Add Spark Streaming + MQTT Integration Guide like
> [Spark Streaming + Flume Integration 
> Guide|http://spark.apache.org/docs/latest/streaming-flume-integration.html]
> [Spark Streaming + Kinesis 
> Integration|http://spark.apache.org/docs/latest/streaming-kinesis-integration.html]
> [Spark Streaming + Kafka Integration 
> Guide|http://spark.apache.org/docs/latest/streaming-kafka-integration.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9597) Add Spark Streaming + MQTT Integration Guide

2015-08-04 Thread Prabeesh K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabeesh K updated SPARK-9597:
--
Summary: Add Spark Streaming + MQTT Integration Guide  (was: Spark 
Streaming + MQTT Integration Guide)

> Add Spark Streaming + MQTT Integration Guide
> 
>
> Key: SPARK-9597
> URL: https://issues.apache.org/jira/browse/SPARK-9597
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: Prabeesh K
>
> Add Spark Streaming + MQTT Integration Guide like
> [Spark Streaming + Flume Integration 
> Guide|http://spark.apache.org/docs/latest/streaming-flume-integration.html]
> [Spark Streaming + Kinesis 
> Integration|http://spark.apache.org/docs/latest/streaming-kinesis-integration.html]
> [Spark Streaming + Kafka Integration 
> Guide|http://spark.apache.org/docs/latest/streaming-kafka-integration.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9583) build/mvn script should not print debug messages to stdout

2015-08-04 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-9583:
--
Assignee: Marcelo Vanzin

> build/mvn script should not print debug messages to stdout
> --
>
> Key: SPARK-9583
> URL: https://issues.apache.org/jira/browse/SPARK-9583
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 1.5.0
>
>
> Doing that means it cannot be used to run {{make-distribution.sh}}, which 
> parses the stdout of maven commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9583) build/mvn script should not print debug messages to stdout

2015-08-04 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-9583:
--
Fix Version/s: 1.5.0

> build/mvn script should not print debug messages to stdout
> --
>
> Key: SPARK-9583
> URL: https://issues.apache.org/jira/browse/SPARK-9583
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 1.5.0
>
>
> Doing that means it cannot be used to run {{make-distribution.sh}}, which 
> parses the stdout of maven commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9583) build/mvn script should not print debug messages to stdout

2015-08-04 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-9583.
---
Resolution: Fixed

> build/mvn script should not print debug messages to stdout
> --
>
> Key: SPARK-9583
> URL: https://issues.apache.org/jira/browse/SPARK-9583
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 1.5.0
>
>
> Doing that means it cannot be used to run {{make-distribution.sh}}, which 
> parses the stdout of maven commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-6283) Add a CassandraInputDStream to stream from a C* table

2015-08-04 Thread Helena Edelson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Helena Edelson closed SPARK-6283.
-
Resolution: Done

I've written this but sadly DataStax has decided to close source it.

> Add a CassandraInputDStream to stream from a C* table
> -
>
> Key: SPARK-6283
> URL: https://issues.apache.org/jira/browse/SPARK-6283
> Project: Spark
>  Issue Type: New Feature
>  Components: Streaming
>Reporter: Helena Edelson
>
> Add support for streaming from Cassandra to Spark Streaming - external.
> Related ticket: https://datastax-oss.atlassian.net/browse/SPARKC-40 
> [~helena_e] is doing the work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9598) do not expose generic getter in internal row

2015-08-04 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-9598:
--

 Summary: do not expose generic getter in internal row
 Key: SPARK-9598
 URL: https://issues.apache.org/jira/browse/SPARK-9598
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9598) do not expose generic getter in internal row

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9598:
---

Assignee: Apache Spark

> do not expose generic getter in internal row
> 
>
> Key: SPARK-9598
> URL: https://issues.apache.org/jira/browse/SPARK-9598
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9598) do not expose generic getter in internal row

2015-08-04 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653691#comment-14653691
 ] 

Apache Spark commented on SPARK-9598:
-

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/7932

> do not expose generic getter in internal row
> 
>
> Key: SPARK-9598
> URL: https://issues.apache.org/jira/browse/SPARK-9598
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9598) do not expose generic getter in internal row

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9598:
---

Assignee: (was: Apache Spark)

> do not expose generic getter in internal row
> 
>
> Key: SPARK-9598
> URL: https://issues.apache.org/jira/browse/SPARK-9598
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9599) Dynamically partitioning based on key-distribution

2015-08-04 Thread JIRA
Zoltán Zvara created SPARK-9599:
---

 Summary: Dynamically partitioning based on key-distribution
 Key: SPARK-9599
 URL: https://issues.apache.org/jira/browse/SPARK-9599
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Affects Versions: 1.4.1
Reporter: Zoltán Zvara


When - for example - using {{groupByKey}} operator with default 
{{HashPartitioner}}, there might be a case when heavy keys get partitioned into 
the same bucket, later raising an OOM error at the result partition. A 
domain-based partitioner might not be able to help, when the outstanding 
key-distribution changes from time to time (for example while dealing with data 
streams).

Spark should identify these situations and change the partitioning accordingly 
when a partitioning would raise an OOM later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7160) Support converting DataFrames to typed RDDs.

2015-08-04 Thread Ray Ortigas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653713#comment-14653713
 ] 

Ray Ortigas commented on SPARK-7160:


Thanks for trying to resolve the conflicts, Michael Armbrust.

Would be happy to sync up around the beginning of 1.6. Would you happen to know 
roughly when that will be?

In the meantime, I'll get my fork re-synced with master and start re-applying 
my changes on it...

> Support converting DataFrames to typed RDDs.
> 
>
> Key: SPARK-7160
> URL: https://issues.apache.org/jira/browse/SPARK-7160
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.3.1
>Reporter: Ray Ortigas
>Assignee: Ray Ortigas
>Priority: Critical
>
> As a Spark user still working with RDDs, I'd like the ability to convert a 
> DataFrame to a typed RDD.
> For example, if I've converted RDDs to DataFrames so that I could save them 
> as Parquet or CSV files, I would like to rebuild the RDD from those files 
> automatically rather than writing the row-to-type conversion myself.
> {code}
> val rdd0 = sc.parallelize(Seq(Food("apple", 1), Food("banana", 2), 
> Food("cherry", 3)))
> val df0 = rdd0.toDF()
> df0.save("foods.parquet")
> val df1 = sqlContext.load("foods.parquet")
> val rdd1 = df1.toTypedRDD[Food]()
> // rdd0 and rdd1 should have the same elements
> {code}
> I originally submitted a smaller PR for spark-csv 
> , but Reynold Xin suggested 
> that converting a DataFrame to a typed RDD wasn't something specific to 
> spark-csv.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-7160) Support converting DataFrames to typed RDDs.

2015-08-04 Thread Ray Ortigas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653713#comment-14653713
 ] 

Ray Ortigas edited comment on SPARK-7160 at 8/4/15 2:28 PM:


Thanks for trying to resolve the conflicts, [~marmbrus].

Would be happy to sync up around the beginning of 1.6. Would you happen to know 
roughly when that will be?

In the meantime, I'll get my fork re-synced with master and start re-applying 
my changes on it...


was (Author: rayortigas):
Thanks for trying to resolve the conflicts, Michael Armbrust.

Would be happy to sync up around the beginning of 1.6. Would you happen to know 
roughly when that will be?

In the meantime, I'll get my fork re-synced with master and start re-applying 
my changes on it...

> Support converting DataFrames to typed RDDs.
> 
>
> Key: SPARK-7160
> URL: https://issues.apache.org/jira/browse/SPARK-7160
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.3.1
>Reporter: Ray Ortigas
>Assignee: Ray Ortigas
>Priority: Critical
>
> As a Spark user still working with RDDs, I'd like the ability to convert a 
> DataFrame to a typed RDD.
> For example, if I've converted RDDs to DataFrames so that I could save them 
> as Parquet or CSV files, I would like to rebuild the RDD from those files 
> automatically rather than writing the row-to-type conversion myself.
> {code}
> val rdd0 = sc.parallelize(Seq(Food("apple", 1), Food("banana", 2), 
> Food("cherry", 3)))
> val df0 = rdd0.toDF()
> df0.save("foods.parquet")
> val df1 = sqlContext.load("foods.parquet")
> val rdd1 = df1.toTypedRDD[Food]()
> // rdd0 and rdd1 should have the same elements
> {code}
> I originally submitted a smaller PR for spark-csv 
> , but Reynold Xin suggested 
> that converting a DataFrame to a typed RDD wasn't something specific to 
> spark-csv.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9599) Dynamic partitioning based on key-distribution

2015-08-04 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SPARK-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Zvara updated SPARK-9599:

Summary: Dynamic partitioning based on key-distribution  (was: Dynamically 
partitioning based on key-distribution)

> Dynamic partitioning based on key-distribution
> --
>
> Key: SPARK-9599
> URL: https://issues.apache.org/jira/browse/SPARK-9599
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.4.1
>Reporter: Zoltán Zvara
>
> When - for example - using {{groupByKey}} operator with default 
> {{HashPartitioner}}, there might be a case when heavy keys get partitioned 
> into the same bucket, later raising an OOM error at the result partition. A 
> domain-based partitioner might not be able to help, when the outstanding 
> key-distribution changes from time to time (for example while dealing with 
> data streams).
> Spark should identify these situations and change the partitioning 
> accordingly when a partitioning would raise an OOM later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9599) Dynamic partitioning based on key-distribution

2015-08-04 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653723#comment-14653723
 ] 

Sean Owen commented on SPARK-9599:
--

For example, in the case of groupByKey, how would anything know a key mapped to 
many values before performing a shuffle anyway?

> Dynamic partitioning based on key-distribution
> --
>
> Key: SPARK-9599
> URL: https://issues.apache.org/jira/browse/SPARK-9599
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.4.1
>Reporter: Zoltán Zvara
>
> When - for example - using {{groupByKey}} operator with default 
> {{HashPartitioner}}, there might be a case when heavy keys get partitioned 
> into the same bucket, later raising an OOM error at the result partition. A 
> domain-based partitioner might not be able to help, when the outstanding 
> key-distribution changes from time to time (for example while dealing with 
> data streams).
> Spark should identify these situations and change the partitioning 
> accordingly when a partitioning would raise an OOM later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9599) Dynamic partitioning based on key-distribution

2015-08-04 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653723#comment-14653723
 ] 

Sean Owen edited comment on SPARK-9599 at 8/4/15 2:33 PM:
--

For example, in the case of groupByKey, how would anything know a key mapped to 
many values before performing a shuffle anyway? 

EDIT: err, I mean a distributed count operation, which isn't trivial but not a 
full shuffle I suppose. Interesting, not sure if there are subtle reasons this 
is hard to work around or not, because now the very partitioning is a function 
of all values in the parent.


was (Author: srowen):
For example, in the case of groupByKey, how would anything know a key mapped to 
many values before performing a shuffle anyway?

> Dynamic partitioning based on key-distribution
> --
>
> Key: SPARK-9599
> URL: https://issues.apache.org/jira/browse/SPARK-9599
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.4.1
>Reporter: Zoltán Zvara
>
> When - for example - using {{groupByKey}} operator with default 
> {{HashPartitioner}}, there might be a case when heavy keys get partitioned 
> into the same bucket, later raising an OOM error at the result partition. A 
> domain-based partitioner might not be able to help, when the outstanding 
> key-distribution changes from time to time (for example while dealing with 
> data streams).
> Spark should identify these situations and change the partitioning 
> accordingly when a partitioning would raise an OOM later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9599) Dynamic partitioning based on key-distribution

2015-08-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653746#comment-14653746
 ] 

Zoltán Zvara commented on SPARK-9599:
-

What I can think of is a new, guided partitioning in case when an in-progress 
shuffle write indicates that a few buckets will raise on OOM error with high 
probability, or will result in a very slow execution in case the user-operator 
on grouped keys is expensive. In other words, you might not get an OOM, but a 
few very slow tasks.

My first idea is to track the distribution of keys while shuffle-write is in 
progress. When we identify that with high probability there will be an OOM or a 
very slow execution, just use the currently captured key-distribution to 
construct a new partitioner that would partition evenly.

> Dynamic partitioning based on key-distribution
> --
>
> Key: SPARK-9599
> URL: https://issues.apache.org/jira/browse/SPARK-9599
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.4.1
>Reporter: Zoltán Zvara
>
> When - for example - using {{groupByKey}} operator with default 
> {{HashPartitioner}}, there might be a case when heavy keys get partitioned 
> into the same bucket, later raising an OOM error at the result partition. A 
> domain-based partitioner might not be able to help, when the outstanding 
> key-distribution changes from time to time (for example while dealing with 
> data streams).
> Spark should identify these situations and change the partitioning 
> accordingly when a partitioning would raise an OOM later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9600) DataFrameWriter.saveAsTable always writes data to "/user/hive/warehouse"

2015-08-04 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-9600:
-

 Summary: DataFrameWriter.saveAsTable always writes data to 
"/user/hive/warehouse"
 Key: SPARK-9600
 URL: https://issues.apache.org/jira/browse/SPARK-9600
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.1, 1.5.0
Reporter: Cheng Lian
Priority: Critical


Having a {{hive-site.xml}} with a non-default {{hive.metastore.warehouse.dir}} 
value, Spark SQL still writes to the default warehouse location 
{{/user/hive/warehouse}} when saving data source tables using 
{{DataFrameWriter.saveAsTable()}}:
{noformat}



  
javax.jdo.option.ConnectionURL
jdbc:mysql://localhost/metastore_hive13_hadoop2
  

  
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
  

  
javax.jdo.option.ConnectionUserName
hive
  

  
javax.jdo.option.ConnectionPassword
password
  

  
hive.metastore.warehouse.dir
hdfs://localhost:9000/user/hive/warehouse_hive13
  

{noformat}
Spark shell snippet to reproduce:
{noformat}
sqlContext.range(10).write.saveAsTable("xxx")
{noformat}
Running {{DESC EXTENDED xxx}} in Hive to check SerDe propertyies:
{noformat}
...
location:hdfs://localhost:9000/user/hive/warehouse_hive13/xxx
...
parameters:{path=hdfs://localhost:9000/user/hive/warehouse/xxx, 
serialization.format=1})
...
{noformat}
We are probably using execution Hive configuration when calling 
{{HiveMetastoreCatalog.hiveDefaultTableFilePath()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9601) Join example fix in streaming-programming-guide.md

2015-08-04 Thread Jayant Shekhar (JIRA)
Jayant Shekhar created SPARK-9601:
-

 Summary: Join example fix in streaming-programming-guide.md
 Key: SPARK-9601
 URL: https://issues.apache.org/jira/browse/SPARK-9601
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.4.1
Reporter: Jayant Shekhar


Stream-Stream Join has the following signature for Java in the guide:

JavaPairDStream joinedStream = stream1.join(stream2);

It should be:
JavaPairDStream> joinedStream = 
stream1.join(stream2);

Same for windowed stream join. It should be:

JavaPairDStream> joinedStream = 
windowedStream1.join(windowedStream2);






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2015-08-04 Thread Patrick Crenshaw (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653799#comment-14653799
 ] 

Patrick Crenshaw commented on SPARK-9478:
-

If I work on this, should I wait until 
https://issues.apache.org/jira/browse/SPARK-3717 is finished?

> Add class weights to Random Forest
> --
>
> Key: SPARK-9478
> URL: https://issues.apache.org/jira/browse/SPARK-9478
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.1
>Reporter: Patrick Crenshaw
>
> Currently, this implementation of random forest does not support class 
> weights. Class weights are important when there is imbalanced training data 
> or the evaluation metric of a classifier is imbalanced (e.g. true positive 
> rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9601) Join example fix in streaming-programming-guide.md

2015-08-04 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9601:
-
Priority: Trivial  (was: Major)

Looks good -- are you making a PR?

> Join example fix in streaming-programming-guide.md
> --
>
> Key: SPARK-9601
> URL: https://issues.apache.org/jira/browse/SPARK-9601
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.4.1
>Reporter: Jayant Shekhar
>Priority: Trivial
>
> Stream-Stream Join has the following signature for Java in the guide:
> JavaPairDStream joinedStream = stream1.join(stream2);
> It should be:
> JavaPairDStream> joinedStream = 
> stream1.join(stream2);
> Same for windowed stream join. It should be:
> JavaPairDStream> joinedStream = 
> windowedStream1.join(windowedStream2);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9432) Audit expression unit tests to make sure we pass the proper numeric ranges

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9432:
---

Assignee: Yijie Shen  (was: Apache Spark)

> Audit expression unit tests to make sure we pass the proper numeric ranges
> --
>
> Key: SPARK-9432
> URL: https://issues.apache.org/jira/browse/SPARK-9432
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Yijie Shen
>Priority: Blocker
>
> For example, if an expression accepts Int and Long, we should make sure we 
> have one test case for Int that uses a numeric input larger than the max 
> value of a Short, and one test case for Long that uses a numeric input larger 
> than the max value of an Int.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9432) Audit expression unit tests to make sure we pass the proper numeric ranges

2015-08-04 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653844#comment-14653844
 ] 

Apache Spark commented on SPARK-9432:
-

User 'yjshen' has created a pull request for this issue:
https://github.com/apache/spark/pull/7933

> Audit expression unit tests to make sure we pass the proper numeric ranges
> --
>
> Key: SPARK-9432
> URL: https://issues.apache.org/jira/browse/SPARK-9432
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Yijie Shen
>Priority: Blocker
>
> For example, if an expression accepts Int and Long, we should make sure we 
> have one test case for Int that uses a numeric input larger than the max 
> value of a Short, and one test case for Long that uses a numeric input larger 
> than the max value of an Int.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9432) Audit expression unit tests to make sure we pass the proper numeric ranges

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9432:
---

Assignee: Apache Spark  (was: Yijie Shen)

> Audit expression unit tests to make sure we pass the proper numeric ranges
> --
>
> Key: SPARK-9432
> URL: https://issues.apache.org/jira/browse/SPARK-9432
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>Priority: Blocker
>
> For example, if an expression accepts Int and Long, we should make sure we 
> have one test case for Int that uses a numeric input larger than the max 
> value of a Short, and one test case for Long that uses a numeric input larger 
> than the max value of an Int.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8244) string function: find_in_set

2015-08-04 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-8244:
--
Assignee: Tarek Auel  (was: Cheng Hao)

> string function: find_in_set
> 
>
> Key: SPARK-8244
> URL: https://issues.apache.org/jira/browse/SPARK-8244
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Tarek Auel
>Priority: Minor
> Fix For: 1.5.0
>
>
> find_in_set(string str, string strList): int
> Returns the first occurance of str in strList where strList is a 
> comma-delimited string. Returns null if either argument is null. Returns 0 if 
> the first argument contains any commas. For example, find_in_set('ab', 
> 'abc,b,ab,c,def') returns 3.
> Only add this to SQL, not DataFrame.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-8244) string function: find_in_set

2015-08-04 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-8244.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

> string function: find_in_set
> 
>
> Key: SPARK-8244
> URL: https://issues.apache.org/jira/browse/SPARK-8244
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Tarek Auel
>Priority: Minor
> Fix For: 1.5.0
>
>
> find_in_set(string str, string strList): int
> Returns the first occurance of str in strList where strList is a 
> comma-delimited string. Returns null if either argument is null. Returns 0 if 
> the first argument contains any commas. For example, find_in_set('ab', 
> 'abc,b,ab,c,def') returns 3.
> Only add this to SQL, not DataFrame.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-8246) string function: get_json_object

2015-08-04 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-8246.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7901
[https://github.com/apache/spark/pull/7901]

> string function: get_json_object
> 
>
> Key: SPARK-8246
> URL: https://issues.apache.org/jira/browse/SPARK-8246
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Nathan Howell
> Fix For: 1.5.0
>
>
> get_json_object(string json_string, string path): string
> This is actually fairly complicated. Take a look at 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
> Only add this to SQL, not DataFrame.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9541) DateTimeUtils cleanup

2015-08-04 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-9541.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7870
[https://github.com/apache/spark/pull/7870]

> DateTimeUtils cleanup
> -
>
> Key: SPARK-9541
> URL: https://issues.apache.org/jira/browse/SPARK-9541
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yijie Shen
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9504) Flaky test: o.a.s.streaming.StreamingContextSuite.stop gracefully

2015-08-04 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653903#comment-14653903
 ] 

Apache Spark commented on SPARK-9504:
-

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/7934

> Flaky test: o.a.s.streaming.StreamingContextSuite.stop gracefully
> -
>
> Key: SPARK-9504
> URL: https://issues.apache.org/jira/browse/SPARK-9504
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>  Labels: flaky-test
> Fix For: 1.5.0
>
>
> Failure build: 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39149/ 
> {code}
> [info] - stop gracefully *** FAILED *** (3 seconds, 522 milliseconds)
> [info]   0 was not greater than 0 (StreamingContextSuite.scala:277)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
> [info]   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
> [info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
> [info]   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$21$$anonfun$apply$mcV$sp$3.apply$mcVI$sp(StreamingContextSuite.scala:277)
> [info]   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
> [info]   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$21.apply$mcV$sp(StreamingContextSuite.scala:261)
> [info]   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$21.apply(StreamingContextSuite.scala:257)
> [info]   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$21.apply(StreamingContextSuite.scala:257)
> [info]   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
> [info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
> [info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
> [info]   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
> [info]   at 
> org.apache.spark.streaming.StreamingContextSuite.org$scalatest$BeforeAndAfter$$super$runTest(StreamingContextSuite.scala:42)
> [info]   at 
> org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200)
> [info]   at 
> org.apache.spark.streaming.StreamingContextSuite.runTest(StreamingContextSuite.scala:42)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
> [info]   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
> [info]   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
> [info]   at scala.collection.immutable.List.foreach(List.scala:318)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
> [info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
> [info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
> [info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
> [info]   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
> [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
> [info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
> [info]   at 
> org.apache.spark.streaming.StreamingContextSuite.org$scalatest$BeforeAndAfter$$super$run(StreamingContextSuite.scala:42)
> [info]   at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241)
> [info]   at 
> org.apache.spark.streaming.StreamingContextSuite.run(StreamingContextSuite.scala:42)
> [info]   at 
> org.scalatest.tools.Framework.org$

[jira] [Created] (SPARK-9602) Remove 'Actor' from the comments

2015-08-04 Thread Nan Zhu (JIRA)
Nan Zhu created SPARK-9602:
--

 Summary: Remove 'Actor' from the comments
 Key: SPARK-9602
 URL: https://issues.apache.org/jira/browse/SPARK-9602
 Project: Spark
  Issue Type: Improvement
  Components: Build, PySpark, Spark Core, Streaming
Reporter: Nan Zhu


Although we have hidden Akka behind RPC interface, I found that the 
Akka/Actor-related comments are still spreading everywhere. To make it 
consistent, we shall remove "actor"/"akka" words from the comments...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9602) Remove 'Actor' from the comments

2015-08-04 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653919#comment-14653919
 ] 

Apache Spark commented on SPARK-9602:
-

User 'CodingCat' has created a pull request for this issue:
https://github.com/apache/spark/pull/7936

> Remove 'Actor' from the comments
> 
>
> Key: SPARK-9602
> URL: https://issues.apache.org/jira/browse/SPARK-9602
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, PySpark, Spark Core, Streaming
>Reporter: Nan Zhu
>
> Although we have hidden Akka behind RPC interface, I found that the 
> Akka/Actor-related comments are still spreading everywhere. To make it 
> consistent, we shall remove "actor"/"akka" words from the comments...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9601) Join example fix in streaming-programming-guide.md

2015-08-04 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653918#comment-14653918
 ] 

Apache Spark commented on SPARK-9601:
-

User 'namitk' has created a pull request for this issue:
https://github.com/apache/spark/pull/7935

> Join example fix in streaming-programming-guide.md
> --
>
> Key: SPARK-9601
> URL: https://issues.apache.org/jira/browse/SPARK-9601
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.4.1
>Reporter: Jayant Shekhar
>Priority: Trivial
>
> Stream-Stream Join has the following signature for Java in the guide:
> JavaPairDStream joinedStream = stream1.join(stream2);
> It should be:
> JavaPairDStream> joinedStream = 
> stream1.join(stream2);
> Same for windowed stream join. It should be:
> JavaPairDStream> joinedStream = 
> windowedStream1.join(windowedStream2);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9601) Join example fix in streaming-programming-guide.md

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9601:
---

Assignee: Apache Spark

> Join example fix in streaming-programming-guide.md
> --
>
> Key: SPARK-9601
> URL: https://issues.apache.org/jira/browse/SPARK-9601
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.4.1
>Reporter: Jayant Shekhar
>Assignee: Apache Spark
>Priority: Trivial
>
> Stream-Stream Join has the following signature for Java in the guide:
> JavaPairDStream joinedStream = stream1.join(stream2);
> It should be:
> JavaPairDStream> joinedStream = 
> stream1.join(stream2);
> Same for windowed stream join. It should be:
> JavaPairDStream> joinedStream = 
> windowedStream1.join(windowedStream2);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9602) Remove 'Actor' from the comments

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9602:
---

Assignee: (was: Apache Spark)

> Remove 'Actor' from the comments
> 
>
> Key: SPARK-9602
> URL: https://issues.apache.org/jira/browse/SPARK-9602
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, PySpark, Spark Core, Streaming
>Reporter: Nan Zhu
>
> Although we have hidden Akka behind RPC interface, I found that the 
> Akka/Actor-related comments are still spreading everywhere. To make it 
> consistent, we shall remove "actor"/"akka" words from the comments...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9602) Remove 'Actor' from the comments

2015-08-04 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9602:
---

Assignee: Apache Spark

> Remove 'Actor' from the comments
> 
>
> Key: SPARK-9602
> URL: https://issues.apache.org/jira/browse/SPARK-9602
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, PySpark, Spark Core, Streaming
>Reporter: Nan Zhu
>Assignee: Apache Spark
>
> Although we have hidden Akka behind RPC interface, I found that the 
> Akka/Actor-related comments are still spreading everywhere. To make it 
> consistent, we shall remove "actor"/"akka" words from the comments...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   >