[jira] [Comment Edited] (SPARK-3838) Python code example for Word2Vec in user guide

2014-10-13 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168989#comment-14168989
 ] 

Anant Daksh Asthana edited comment on SPARK-3838 at 10/13/14 6:22 AM:
--

Thanks [~mengxr] I will follow the instructions. I did also mention the coding 
guides are centered around Java/ Scala.


was (Author: slcclimber):
Thanks [~mengxr] I will follow the instructions. I did also mention the coding 
guides are centered around Java/ Scala. It would be nice to create one for 
Pyspark which colsely follows PEP-8.

 Python code example for Word2Vec in user guide
 --

 Key: SPARK-3838
 URL: https://issues.apache.org/jira/browse/SPARK-3838
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib
Reporter: Xiangrui Meng
Assignee: Anant Daksh Asthana
Priority: Trivial





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3334) Spark causes mesos-master memory leak

2014-10-13 Thread Iven Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168999#comment-14168999
 ] 

Iven Hsu commented on SPARK-3334:
-

With spark 1.1.0, {{akkaFrameSize}} is same as other backends, reading from 
configuration. But the minimum value of it is 32000, and can't be set to 0, so 
it will still cause mesos-master to leak memory.

Anyone look into this?

 Spark causes mesos-master memory leak
 -

 Key: SPARK-3334
 URL: https://issues.apache.org/jira/browse/SPARK-3334
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.0.2
 Environment: Mesos 0.16.0/0.19.0
 CentOS 6.4
Reporter: Iven Hsu

 The {{akkaFrameSize}} is set to {{Long.MaxValue}} in MesosBackend to 
 workaround SPARK-1112, this causes all serialized task result is sent using 
 Mesos TaskStatus.
 mesos-master stores TaskStatus in memory, and when running Spark, its memory 
 grows very fast, and will be OOM killed.
 See MESOS-1746 for more.
 I've tried to set {{akkaFrameSize}} to 0, mesos-master won't be killed, 
 however, the driver will block after success unless I use {{sc.stop()}} to 
 quit it manually. Not sure if it's related to SPARK-1112.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3899) wrong links in streaming doc

2014-10-13 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-3899.
---
   Resolution: Fixed
Fix Version/s: 1.1.1
   1.2.0

Issue resolved by pull request 2749
[https://github.com/apache/spark/pull/2749]

 wrong links in streaming doc
 

 Key: SPARK-3899
 URL: https://issues.apache.org/jira/browse/SPARK-3899
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.1.0
Reporter: wangfei
 Fix For: 1.2.0, 1.1.1






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3921) WorkerWatcher in Standalone mode fail to come up due to invalid workerUrl

2014-10-13 Thread Aaron Davidson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Davidson updated SPARK-3921:
--
Description: 
As of [this 
commit|https://github.com/apache/spark/commit/79e45c9323455a51f25ed9acd0edd8682b4bbb88#diff-79391110e9f26657e415aa169a004998R153],
 standalone mode appears to have lost its WorkerWatcher, because of the swapped 
workerUrl and appId parameters. We still put workerUrl before appId when we 
start standalone executors, and the Executor misinterprets the appId as the 
workerUrl and fails to create the WorkerWatcher.

Note that this does not seem to crash the Standalone executor mode, despite the 
failing of the WorkerWatcher during its constructor.

  was:As of [this 
commit|https://github.com/apache/spark/commit/79e45c9323455a51f25ed9acd0edd8682b4bbb88#diff-79391110e9f26657e415aa169a004998R153],
 standalone mode appears to be broken, because of the swapped workerUrl and 
appId parameters. We still put workerUrl before appId when we start standalone 
executors, and the Executor misinterprets the appId as the workerUrl and fails 
to create the WorkerWatcher.

Summary: WorkerWatcher in Standalone mode fail to come up due to 
invalid workerUrl  (was: Executors in Standalone mode fail to come up due to 
invalid workerUrl)

 WorkerWatcher in Standalone mode fail to come up due to invalid workerUrl
 -

 Key: SPARK-3921
 URL: https://issues.apache.org/jira/browse/SPARK-3921
 Project: Spark
  Issue Type: Bug
Reporter: Aaron Davidson
Assignee: Aaron Davidson
Priority: Critical

 As of [this 
 commit|https://github.com/apache/spark/commit/79e45c9323455a51f25ed9acd0edd8682b4bbb88#diff-79391110e9f26657e415aa169a004998R153],
  standalone mode appears to have lost its WorkerWatcher, because of the 
 swapped workerUrl and appId parameters. We still put workerUrl before appId 
 when we start standalone executors, and the Executor misinterprets the appId 
 as the workerUrl and fails to create the WorkerWatcher.
 Note that this does not seem to crash the Standalone executor mode, despite 
 the failing of the WorkerWatcher during its constructor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3921) WorkerWatcher in Standalone mode fail to come up due to invalid workerUrl

2014-10-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-3921:
-
 Target Version/s: 1.2.0
Affects Version/s: 1.2.0

 WorkerWatcher in Standalone mode fail to come up due to invalid workerUrl
 -

 Key: SPARK-3921
 URL: https://issues.apache.org/jira/browse/SPARK-3921
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Aaron Davidson
Assignee: Aaron Davidson
Priority: Critical

 As of [this 
 commit|https://github.com/apache/spark/commit/79e45c9323455a51f25ed9acd0edd8682b4bbb88#diff-79391110e9f26657e415aa169a004998R153],
  standalone mode appears to have lost its WorkerWatcher, because of the 
 swapped workerUrl and appId parameters. We still put workerUrl before appId 
 when we start standalone executors, and the Executor misinterprets the appId 
 as the workerUrl and fails to create the WorkerWatcher.
 Note that this does not seem to crash the Standalone executor mode, despite 
 the failing of the WorkerWatcher during its constructor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3905) The keys for sorting the columns of Executor page ,Stage page Storage page are incorrect

2014-10-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169043#comment-14169043
 ] 

Apache Spark commented on SPARK-3905:
-

User 'witgo' has created a pull request for this issue:
https://github.com/apache/spark/pull/2763

 The keys for sorting the columns of Executor page ,Stage page Storage page  
 are incorrect
 -

 Key: SPARK-3905
 URL: https://issues.apache.org/jira/browse/SPARK-3905
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.0.2, 1.1.0, 1.2.0
Reporter: Guoqiang Li
Assignee: Guoqiang Li
 Fix For: 1.1.1, 1.2.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3923) All Standalone Mode services time out with each other

2014-10-13 Thread Aaron Davidson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169044#comment-14169044
 ] 

Aaron Davidson commented on SPARK-3923:
---

I did a little digging hoping to find some post about this, no particular luck. 
I did find [this 
post|https://groups.google.com/forum/#!topic/akka-user/X3xzpTCbEFs] which 
recommends using an interval time  pause, which we are not doing. This doesn't 
seem to explain the services all timing out after the heartbeat interval time 
(which is currently 1000 seconds), but may be good to know in the future.

 All Standalone Mode services time out with each other
 -

 Key: SPARK-3923
 URL: https://issues.apache.org/jira/browse/SPARK-3923
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.2.0
Reporter: Aaron Davidson
Priority: Blocker

 I'm seeing an issue where it seems that components in Standalone Mode 
 (Worker, Master, Driver, and Executor) all seem to time out with each other 
 after around 1000 seconds. Here is an example log:
 {code}
 14/10/13 06:43:55 INFO Master: Registering worker 
 ip-10-0-147-189.us-west-2.compute.internal:38922 with 4 cores, 29.0 GB RAM
 14/10/13 06:43:55 INFO Master: Registering worker 
 ip-10-0-175-214.us-west-2.compute.internal:42918 with 4 cores, 59.0 GB RAM
 14/10/13 06:43:56 INFO Master: Registering app Databricks Shell
 14/10/13 06:43:56 INFO Master: Registered app Databricks Shell with ID 
 app-20141013064356-
 ... precisely 1000 seconds later ...
 14/10/13 07:00:35 WARN ReliableDeliverySupervisor: Association with remote 
 system 
 [akka.tcp://sparkwor...@ip-10-0-147-189.us-west-2.compute.internal:38922] has 
 failed, address is now gated for [5000] ms. Reason is: [Disassociated].
 14/10/13 07:00:35 INFO Master: 
 akka.tcp://sparkwor...@ip-10-0-147-189.us-west-2.compute.internal:38922 got 
 disassociated, removing it.
 14/10/13 07:00:35 INFO LocalActorRef: Message 
 [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from 
 Actor[akka://sparkMaster/deadLetters] to 
 Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.0.147.189%3A54956-1#1529980245]
  was not delivered. [2] dead letters encountered. This logging can be turned 
 off or adjusted with configuration settings 'akka.log-dead-letters' and 
 'akka.log-dead-letters-during-shutdown'.
 14/10/13 07:00:35 INFO Master: 
 akka.tcp://sparkwor...@ip-10-0-175-214.us-west-2.compute.internal:42918 got 
 disassociated, removing it.
 14/10/13 07:00:35 INFO Master: Removing worker 
 worker-20141013064354-ip-10-0-175-214.us-west-2.compute.internal-42918 on 
 ip-10-0-175-214.us-west-2.compute.internal:42918
 14/10/13 07:00:35 INFO Master: Telling app of lost executor: 1
 14/10/13 07:00:35 INFO Master: 
 akka.tcp://sparkwor...@ip-10-0-175-214.us-west-2.compute.internal:42918 got 
 disassociated, removing it.
 14/10/13 07:00:35 WARN ReliableDeliverySupervisor: Association with remote 
 system 
 [akka.tcp://sparkwor...@ip-10-0-175-214.us-west-2.compute.internal:42918] has 
 failed, address is now gated for [5000] ms. Reason is: [Disassociated].
 14/10/13 07:00:35 INFO LocalActorRef: Message 
 [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from 
 Actor[akka://sparkMaster/deadLetters] to 
 Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.0.175.214%3A35958-2#314633324]
  was not delivered. [3] dead letters encountered. This logging can be turned 
 off or adjusted with configuration settings 'akka.log-dead-letters' and 
 'akka.log-dead-letters-during-shutdown'.
 14/10/13 07:00:35 INFO LocalActorRef: Message 
 [akka.remote.transport.AssociationHandle$Disassociated] from 
 Actor[akka://sparkMaster/deadLetters] to 
 Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.0.175.214%3A35958-2#314633324]
  was not delivered. [4] dead letters encountered. This logging can be turned 
 off or adjusted with configuration settings 'akka.log-dead-letters' and 
 'akka.log-dead-letters-during-shutdown'.
 14/10/13 07:00:36 INFO ProtocolStateActor: No response from remote. Handshake 
 timed out or transport failure detector triggered.
 14/10/13 07:00:36 INFO Master: 
 akka.tcp://sparkdri...@ip-10-0-175-215.us-west-2.compute.internal:58259 got 
 disassociated, removing it.
 14/10/13 07:00:36 INFO LocalActorRef: Message 
 [akka.remote.transport.AssociationHandle$InboundPayload] from 
 Actor[akka://sparkMaster/deadLetters] to 
 Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.0.175.215%3A41987-3#1944377249]
  was not delivered. [5] dead letters encountered. This logging can 

[jira] [Reopened] (SPARK-3598) cast to timestamp should be the same as hive

2014-10-13 Thread Adrian Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang reopened SPARK-3598:


reopen to change assignee...

 cast to timestamp should be the same as hive
 

 Key: SPARK-3598
 URL: https://issues.apache.org/jira/browse/SPARK-3598
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Adrian Wang
 Fix For: 1.2.0


 select cast(1000 as timestamp) from src limit 1;
 should return 1970-01-01 00:00:01
 also, current implementation has bug when the time is before 1970-01-01 
 00:00:00



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3598) cast to timestamp should be the same as hive

2014-10-13 Thread Adrian Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang resolved SPARK-3598.

Resolution: Fixed

 cast to timestamp should be the same as hive
 

 Key: SPARK-3598
 URL: https://issues.apache.org/jira/browse/SPARK-3598
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Adrian Wang
Assignee: Adrian Wang
 Fix For: 1.2.0


 select cast(1000 as timestamp) from src limit 1;
 should return 1970-01-01 00:00:01
 also, current implementation has bug when the time is before 1970-01-01 
 00:00:00



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3222) cross join support in HiveQl

2014-10-13 Thread Adrian Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang resolved SPARK-3222.

Resolution: Fixed

 cross join support in HiveQl
 

 Key: SPARK-3222
 URL: https://issues.apache.org/jira/browse/SPARK-3222
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Adrian Wang
Assignee: Adrian Wang
 Fix For: 1.1.0


 Spark SQL hiveQl should support cross join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-3222) cross join support in HiveQl

2014-10-13 Thread Adrian Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang reopened SPARK-3222:


reopen to change assignee to myself

 cross join support in HiveQl
 

 Key: SPARK-3222
 URL: https://issues.apache.org/jira/browse/SPARK-3222
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Adrian Wang
 Fix For: 1.1.0


 Spark SQL hiveQl should support cross join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3924) Upgrade to Akka version 2.3.6

2014-10-13 Thread Helena Edelson (JIRA)
Helena Edelson created SPARK-3924:
-

 Summary: Upgrade to Akka version 2.3.6
 Key: SPARK-3924
 URL: https://issues.apache.org/jira/browse/SPARK-3924
 Project: Spark
  Issue Type: Dependency upgrade
 Environment: deploy env
Reporter: Helena Edelson


I tried every sbt in the book but can't use the latest Akka version in my 
project with Spark. It would be great if I could.

Also I can not use the latest Typesafe Config - 1.2.1, which would also be 
great.

This is a big change. If I have time I can do a PR.
[~helena_e]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-2593) Add ability to pass an existing Akka ActorSystem into Spark

2014-10-13 Thread Helena Edelson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169208#comment-14169208
 ] 

Helena Edelson edited comment on SPARK-2593 at 10/13/14 11:55 AM:
--

[~matei], [~pwendell] Yes I see the pain point here now. I just created a 
ticket to upgrade Akka and thus Typesafe Config versions because I am now 
locked into 2.2.3 and have binary incompatibility with using latest Akka 2.3.6 
/ config 1.2.1. Makes me very sad.

I think I would throw in the towel on this one if you can make it completely 
separate so that a user with it's own AkkaSystem and Config versions are not 
affected? Tricky because when deploying, spark needs its version (provided?) 
and the user app needs the other.


was (Author: helena_e):
[~matei] [~pwendell] Yes I see the pain point here now. I just created a ticket 
to upgrade Akka and thus Typesafe Config versions because I am now locked into 
2.2.3 and have binary incompatibility with using latest Akka 2.3.6 / config 
1.2.1. Makes me very sad.

I think I would throw in the towel on this one if you can make it completely 
separate so that a user with it's own AkkaSystem and Config versions are not 
affected? Tricky because when deploying, spark needs its version (provided?) 
and the user app needs the other.

 Add ability to pass an existing Akka ActorSystem into Spark
 ---

 Key: SPARK-2593
 URL: https://issues.apache.org/jira/browse/SPARK-2593
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Helena Edelson

 As a developer I want to pass an existing ActorSystem into StreamingContext 
 in load-time so that I do not have 2 actor systems running on a node in an 
 Akka application.
 This would mean having spark's actor system on its own named-dispatchers as 
 well as exposing the new private creation of its own actor system.
   
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2593) Add ability to pass an existing Akka ActorSystem into Spark

2014-10-13 Thread Helena Edelson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169208#comment-14169208
 ] 

Helena Edelson commented on SPARK-2593:
---

[~matei] [~pwendell] Yes I see the pain point here now. I just created a ticket 
to upgrade Akka and thus Typesafe Config versions because I am now locked into 
2.2.3 and have binary incompatibility with using latest Akka 2.3.6 / config 
1.2.1. Makes me very sad.

I think I would throw in the towel on this one if you can make it completely 
separate so that a user with it's own AkkaSystem and Config versions are not 
affected? Tricky because when deploying, spark needs its version (provided?) 
and the user app needs the other.

 Add ability to pass an existing Akka ActorSystem into Spark
 ---

 Key: SPARK-2593
 URL: https://issues.apache.org/jira/browse/SPARK-2593
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Helena Edelson

 As a developer I want to pass an existing ActorSystem into StreamingContext 
 in load-time so that I do not have 2 actor systems running on a node in an 
 Akka application.
 This would mean having spark's actor system on its own named-dispatchers as 
 well as exposing the new private creation of its own actor system.
   
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1138) Spark 0.9.0 does not work with Hadoop / HDFS

2014-10-13 Thread Sunil Prabhakara (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169254#comment-14169254
 ] 

Sunil Prabhakara commented on SPARK-1138:
-

I am using Cloudera Version 4.2.1, Spark 1.1.0 and Scala 2.10.4; Observing 
similar error 

ERROR Remoting: Remoting error: [Startup failed] [
akka.remote.RemoteTransportException: Startup failed
at 
akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129)
at akka.remote.Remoting.start(Remoting.scala:194)
at 
akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
at 
org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at 
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1446)
...
along with 
Exception in thread main org.jboss.netty.channel.ChannelException: Failed to 
bind to: my-host-name/10.65.42.145:0
at 
org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at 
akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:391)
at 
akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:388)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Success.map(Try.scala:206)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
...

For the second error I tried to update the /etc/hosts file with IP address of 
my host name and updated the spark-env.sh files with same IP address as 
suggested in other answer but still struck with the above issues.

 Spark 0.9.0 does not work with Hadoop / HDFS
 

 Key: SPARK-1138
 URL: https://issues.apache.org/jira/browse/SPARK-1138
 Project: Spark
  Issue Type: Bug
Reporter: Sam Abeyratne

 UPDATE: This problem is certainly related to trying to use Spark 0.9.0 and 
 the latest cloudera Hadoop / HDFS in the same jar.  It seems no matter how I 
 fiddle with the deps, the do not play nice together.
 I'm getting a java.util.concurrent.TimeoutException when trying to create a 
 spark context with 0.9.  I cannot, whatever I do, change the timeout.  I've 
 tried using System.setProperty, the SparkConf mechanism of creating a 
 SparkContext and the -D flags when executing my jar.  I seem to be able to 
 run simple jobs from the spark-shell OK, but my more complicated jobs require 
 external libraries so I need to build jars and execute them.
 Some code that causes this:
 println(Creating config)
 val conf = new SparkConf()
   .setMaster(clusterMaster)
   .setAppName(MyApp)
   .setSparkHome(sparkHome)
   .set(spark.akka.askTimeout, parsed.getOrElse(timeouts, 100))
   .set(spark.akka.timeout, parsed.getOrElse(timeouts, 100))
 println(Creating sc)
 implicit val sc = new SparkContext(conf)
 The output:
 Creating config
 Creating sc
 log4j:WARN No appenders could be found for logger 
 (akka.event.slf4j.Slf4jLogger).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
 info.
 [ERROR] [02/26/2014 11:05:25.491] [main] [Remoting] Remoting error: [Startup 
 timed out] [
 akka.remote.RemoteTransportException: Startup timed out
   at 
 akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129)
   at akka.remote.Remoting.start(Remoting.scala:191)
   at 
 akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
   at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579)
   at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577)
   at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588)
   at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
   at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
   at 
 org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:96)
   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:126)
   at org.apache.spark.SparkContext.init(SparkContext.scala:139)
   at 
 

[jira] [Comment Edited] (SPARK-1138) Spark 0.9.0 does not work with Hadoop / HDFS

2014-10-13 Thread Sunil Prabhakara (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169254#comment-14169254
 ] 

Sunil Prabhakara edited comment on SPARK-1138 at 10/13/14 1:01 PM:
---

I am using Cloudera Version 4.2.1, Spark 1.1.0 and Scala 2.10.4; Observing 
similar error 

ERROR Remoting: Remoting error: [Startup failed] [
akka.remote.RemoteTransportException: Startup failed
at 
akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129)
at akka.remote.Remoting.start(Remoting.scala:194)
at 
akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
at 
org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at 
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1446)
...
along with 
Exception in thread main org.jboss.netty.channel.ChannelException: Failed to 
bind to: my-host-name/10.65.42.145:0
at 
org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at 
akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:391)
at 
akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:388)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Success.map(Try.scala:206)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
...

For the second error I tried to update the /etc/hosts file with IP address of 
my host name and updated the spark-env.sh files with same IP address as 
suggested in other answer but still struck with the above issues.

I tried adding Netty 3.6.6 to the dependency but still didn't get resolved. 


was (Author: sunil.prabhak...@gmail.com):
I am using Cloudera Version 4.2.1, Spark 1.1.0 and Scala 2.10.4; Observing 
similar error 

ERROR Remoting: Remoting error: [Startup failed] [
akka.remote.RemoteTransportException: Startup failed
at 
akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129)
at akka.remote.Remoting.start(Remoting.scala:194)
at 
akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
at 
org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at 
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1446)
...
along with 
Exception in thread main org.jboss.netty.channel.ChannelException: Failed to 
bind to: my-host-name/10.65.42.145:0
at 
org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at 
akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:391)
at 
akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:388)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Success.map(Try.scala:206)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
...

For the second error I tried to update the /etc/hosts file with IP address of 
my host name and updated the spark-env.sh files with same IP address as 
suggested in other answer but still struck with the above issues.

 Spark 0.9.0 does not work with Hadoop / HDFS
 

 Key: SPARK-1138
 URL: https://issues.apache.org/jira/browse/SPARK-1138
 

[jira] [Commented] (SPARK-3586) Support nested directories in Spark Streaming

2014-10-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169276#comment-14169276
 ] 

Apache Spark commented on SPARK-3586:
-

User 'wangxiaojing' has created a pull request for this issue:
https://github.com/apache/spark/pull/2765

 Support nested directories in Spark Streaming
 -

 Key: SPARK-3586
 URL: https://issues.apache.org/jira/browse/SPARK-3586
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.1.0
Reporter: wangxj
Priority: Minor
  Labels: patch
 Fix For: 1.1.0


 For  text files, the method streamingContext.textFileStream(dataDirectory). 
 Spark Streaming will monitor the directory dataDirectory and process any 
 files created in that directory.but files written in nested directories not 
 supported
 eg
 streamingContext.textFileStream(/test). 
 Look at the direction contents:
 /test/file1
 /test/file2
 /test/dr/file1
 In this mothod the textFileStream can only read file:
 /test/file1
 /test/file2
 /test/dr/
 but the file: /test/dr/file1  is not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2863) Emulate Hive type coercion in native reimplementations of Hive functions

2014-10-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169306#comment-14169306
 ] 

Apache Spark commented on SPARK-2863:
-

User 'willb' has created a pull request for this issue:
https://github.com/apache/spark/pull/2768

 Emulate Hive type coercion in native reimplementations of Hive functions
 

 Key: SPARK-2863
 URL: https://issues.apache.org/jira/browse/SPARK-2863
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0
Reporter: William Benton
Assignee: William Benton

 Native reimplementations of Hive functions no longer have the same 
 type-coercion behavior as they would if executed via Hive.  As [Michael 
 Armbrust points 
 out|https://github.com/apache/spark/pull/1750#discussion_r15790970], queries 
 like {{SELECT SQRT(2) FROM src LIMIT 1}} succeed in Hive but fail if 
 {{SQRT}} is implemented natively.
 Spark SQL should have Hive-compatible type coercions for arguments to 
 natively-implemented functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3925) Considering the ordering of qualifiers when comparison

2014-10-13 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-3925:
--

 Summary: Considering the ordering of qualifiers when comparison
 Key: SPARK-3925
 URL: https://issues.apache.org/jira/browse/SPARK-3925
 Project: Spark
  Issue Type: Bug
Reporter: Liang-Chi Hsieh


The qualifiers orderings should be considered during the comparison between old 
qualifiers and new qualifiers when calling 'withQualifiers'.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3925) Considering the ordering of qualifiers during comparison

2014-10-13 Thread Liang-Chi Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-3925:
---
Summary: Considering the ordering of qualifiers during comparison  (was: 
Considering the ordering of qualifiers when comparison)

 Considering the ordering of qualifiers during comparison
 

 Key: SPARK-3925
 URL: https://issues.apache.org/jira/browse/SPARK-3925
 Project: Spark
  Issue Type: Bug
Reporter: Liang-Chi Hsieh

 The qualifiers orderings should be considered during the comparison between 
 old qualifiers and new qualifiers when calling 'withQualifiers'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3925) Considering the ordering of qualifiers when comparison

2014-10-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169311#comment-14169311
 ] 

Apache Spark commented on SPARK-3925:
-

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/2783

 Considering the ordering of qualifiers when comparison
 --

 Key: SPARK-3925
 URL: https://issues.apache.org/jira/browse/SPARK-3925
 Project: Spark
  Issue Type: Bug
Reporter: Liang-Chi Hsieh

 The qualifiers orderings should be considered during the comparison between 
 old qualifiers and new qualifiers when calling 'withQualifiers'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3925) Do not consider the ordering of qualifiers during comparison

2014-10-13 Thread Liang-Chi Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-3925:
---
Description: 
The qualifiers orderings should not be considered during the comparison between 
old qualifiers and new qualifiers when calling 'withQualifiers'.


  was:
The qualifiers orderings should be considered during the comparison between old 
qualifiers and new qualifiers when calling 'withQualifiers'.



 Do not consider the ordering of qualifiers during comparison
 

 Key: SPARK-3925
 URL: https://issues.apache.org/jira/browse/SPARK-3925
 Project: Spark
  Issue Type: Bug
Reporter: Liang-Chi Hsieh

 The qualifiers orderings should not be considered during the comparison 
 between old qualifiers and new qualifiers when calling 'withQualifiers'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3925) Do not considering the ordering of qualifiers during comparison

2014-10-13 Thread Liang-Chi Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-3925:
---
Summary: Do not considering the ordering of qualifiers during comparison  
(was: Considering the ordering of qualifiers during comparison)

 Do not considering the ordering of qualifiers during comparison
 ---

 Key: SPARK-3925
 URL: https://issues.apache.org/jira/browse/SPARK-3925
 Project: Spark
  Issue Type: Bug
Reporter: Liang-Chi Hsieh

 The qualifiers orderings should be considered during the comparison between 
 old qualifiers and new qualifiers when calling 'withQualifiers'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3925) Do not consider the ordering of qualifiers during comparison

2014-10-13 Thread Liang-Chi Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-3925:
---
Summary: Do not consider the ordering of qualifiers during comparison  
(was: Do not considering the ordering of qualifiers during comparison)

 Do not consider the ordering of qualifiers during comparison
 

 Key: SPARK-3925
 URL: https://issues.apache.org/jira/browse/SPARK-3925
 Project: Spark
  Issue Type: Bug
Reporter: Liang-Chi Hsieh

 The qualifiers orderings should be considered during the comparison between 
 old qualifiers and new qualifiers when calling 'withQualifiers'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3869) ./bin/spark-class miss Java version with _JAVA_OPTIONS set

2014-10-13 Thread cocoatomo (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169329#comment-14169329
 ] 

cocoatomo commented on SPARK-3869:
--

Hi [~pwendell], thank you for informing me. Is it OK to use the abbreviated 
last name (e.g. Barack O.) ?

 ./bin/spark-class miss Java version with _JAVA_OPTIONS set
 --

 Key: SPARK-3869
 URL: https://issues.apache.org/jira/browse/SPARK-3869
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 1.2.0
 Environment: Mac OS X 10.9.5, Python 2.6.8, Java 1.8.0_20
Reporter: cocoatomo

 When _JAVA_OPTIONS environment variable is set, a command java -version 
 outputs a message like Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8.
 ./bin/spark-class knows java version from the first line of java -version 
 output, so it mistakes java version with _JAVA_OPTIONS set.
 commit: a85f24accd3266e0f97ee04d03c22b593d99c062



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7

2014-10-13 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169331#comment-14169331
 ] 

Nicholas Chammas commented on SPARK-922:


[~joshrosen] - Do you mean [this 
script|https://github.com/mesos/spark-ec2/blob/v4/create_image.sh]? I doesn't 
seem to have anything related to Python 2.7.

Anyway, what I meant was if you were open to holding off on updating the Spark 
AMIs until we had also figured out how to automate that process per 
[SPARK-3821]. I should have something for that as soon as this week or next.

 Update Spark AMI to Python 2.7
 --

 Key: SPARK-922
 URL: https://issues.apache.org/jira/browse/SPARK-922
 Project: Spark
  Issue Type: Task
  Components: EC2, PySpark
Affects Versions: 0.9.0, 0.9.1, 1.0.0, 1.1.0
Reporter: Josh Rosen

 Many Python libraries only support Python 2.7+, so we should make Python 2.7 
 the default Python on the Spark AMIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-922) Update Spark AMI to Python 2.7

2014-10-13 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169331#comment-14169331
 ] 

Nicholas Chammas edited comment on SPARK-922 at 10/13/14 2:19 PM:
--

[~joshrosen] - Do you mean [this 
script|https://github.com/mesos/spark-ec2/blob/v4/create_image.sh]? It doesn't 
seem to have anything related to Python 2.7.

Anyway, what I meant was if you were open to holding off on updating the Spark 
AMIs until we had also figured out how to automate that process per 
[SPARK-3821]. I should have something for that as soon as this week or next.


was (Author: nchammas):
[~joshrosen] - Do you mean [this 
script|https://github.com/mesos/spark-ec2/blob/v4/create_image.sh]? I doesn't 
seem to have anything related to Python 2.7.

Anyway, what I meant was if you were open to holding off on updating the Spark 
AMIs until we had also figured out how to automate that process per 
[SPARK-3821]. I should have something for that as soon as this week or next.

 Update Spark AMI to Python 2.7
 --

 Key: SPARK-922
 URL: https://issues.apache.org/jira/browse/SPARK-922
 Project: Spark
  Issue Type: Task
  Components: EC2, PySpark
Affects Versions: 0.9.0, 0.9.1, 1.0.0, 1.1.0
Reporter: Josh Rosen

 Many Python libraries only support Python 2.7+, so we should make Python 2.7 
 the default Python on the Spark AMIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3926) result of JavaRDD collectAsMap() is not serializable

2014-10-13 Thread Antoine Amend (JIRA)
Antoine Amend created SPARK-3926:


 Summary: result of JavaRDD collectAsMap() is not serializable
 Key: SPARK-3926
 URL: https://issues.apache.org/jira/browse/SPARK-3926
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.1.0
 Environment: CentOS / Spark 1.1 / Hadoop Hortonworks 2.4.0.2.1.2.0-402
Reporter: Antoine Amend


Using the Java API, I want to collect the result of a RDDString, String as a 
HashMap using collectAsMap function:
MapString, String map = myJavaRDD.collectAsMap();
This works fine, but when passing this map to another function, such as...
myOtherJavaRDD.mapToPair(new CustomFunction(map))
...this leads to the following error:

Exception in thread main org.apache.spark.SparkException: Task not 
serializable

at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)

at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)

at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)

at org.apache.spark.rdd.RDD.map(RDD.scala:270)

at 
org.apache.spark.api.java.JavaRDDLike$class.mapToPair(JavaRDDLike.scala:99)

at org.apache.spark.api.java.JavaPairRDD.mapToPair(JavaPairRDD.scala:44)

../.. MY CLASS ../..

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: java.io.NotSerializableException: 
scala.collection.convert.Wrappers$MapWrapper

at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)

at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)

at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)

at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)

at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)

at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)

at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)

at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)

at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)

at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)

at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42)

at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)

at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164)

This seems to be due to WrapAsJava.scala being non serializable
../..
  implicit def mapAsJavaMap[A, B](m: Map[A, B]): ju.Map[A, B] = m match {
//case JConcurrentMapWrapper(wrapped) = wrapped
case JMapWrapper(wrapped) = wrapped.asInstanceOf[ju.Map[A, B]]
case _ = new MapWrapper(m)
  }
../..

The workaround is to manually wrapper this map into another one (serialized)
MapString, String map = myJavaRDD.collectAsMap();
MapString, String tmp = new HashMapString, String(map);
myOtherJavaRDD.mapToPair(new CustomFunction(tmp))




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-922) Update Spark AMI to Python 2.7

2014-10-13 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169380#comment-14169380
 ] 

Josh Rosen commented on SPARK-922:
--

[~nchammas] - I don't think that there's an urgent rush to update the AMIs 
before the next round of releases, so I'm fine with waiting to incorporate this 
into SPARK-3821.

 Update Spark AMI to Python 2.7
 --

 Key: SPARK-922
 URL: https://issues.apache.org/jira/browse/SPARK-922
 Project: Spark
  Issue Type: Task
  Components: EC2, PySpark
Affects Versions: 0.9.0, 0.9.1, 1.0.0, 1.1.0
Reporter: Josh Rosen

 Many Python libraries only support Python 2.7+, so we should make Python 2.7 
 the default Python on the Spark AMIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3926) result of JavaRDD collectAsMap() is not serializable

2014-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169383#comment-14169383
 ] 

Sean Owen commented on SPARK-3926:
--

Yeah, seems fine to just let {{MapWrapper}} implement {{Serializable}}, because 
standard Java {{Map}} implementations are as well. It's backwards-compatible so 
seems like an easy PR to submit if you like.

 result of JavaRDD collectAsMap() is not serializable
 

 Key: SPARK-3926
 URL: https://issues.apache.org/jira/browse/SPARK-3926
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.1.0
 Environment: CentOS / Spark 1.1 / Hadoop Hortonworks 2.4.0.2.1.2.0-402
Reporter: Antoine Amend

 Using the Java API, I want to collect the result of a RDDString, String as 
 a HashMap using collectAsMap function:
 MapString, String map = myJavaRDD.collectAsMap();
 This works fine, but when passing this map to another function, such as...
 myOtherJavaRDD.mapToPair(new CustomFunction(map))
 ...this leads to the following error:
 Exception in thread main org.apache.spark.SparkException: Task not 
 serializable
   at 
 org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
   at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
   at org.apache.spark.rdd.RDD.map(RDD.scala:270)
   at 
 org.apache.spark.api.java.JavaRDDLike$class.mapToPair(JavaRDDLike.scala:99)
   at org.apache.spark.api.java.JavaPairRDD.mapToPair(JavaPairRDD.scala:44)
   ../.. MY CLASS ../..
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 Caused by: java.io.NotSerializableException: 
 scala.collection.convert.Wrappers$MapWrapper
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
   at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
   at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
   at 
 org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42)
   at 
 org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
 at 
 org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164)
 This seems to be due to WrapAsJava.scala being non serializable
 ../..
   implicit def mapAsJavaMap[A, B](m: Map[A, B]): ju.Map[A, B] = m match {
 //case JConcurrentMapWrapper(wrapped) = wrapped
 case JMapWrapper(wrapped) = wrapped.asInstanceOf[ju.Map[A, B]]
 case _ = new MapWrapper(m)
   }
 ../..
 The workaround is to manually wrapper this map into another one (serialized)
 MapString, String map = myJavaRDD.collectAsMap();
 MapString, String tmp = new HashMapString, String(map);
 myOtherJavaRDD.mapToPair(new CustomFunction(tmp))



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3897) Scala style: format example code

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-3897.
--
Resolution: Won't Fix

Given recent discussion, and consensus to not make sweeping style changes, I 
think this is WontFix.

 Scala style: format example code
 

 Key: SPARK-3897
 URL: https://issues.apache.org/jira/browse/SPARK-3897
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Reporter: sjk

 https://github.com/apache/spark/pull/2754



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3895) Scala style: Indentation of method

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-3895.
--
Resolution: Won't Fix

Given recent discussion, and consensus to not make sweeping style changes, I 
think this is WontFix.

 Scala style: Indentation of method
 --

 Key: SPARK-3895
 URL: https://issues.apache.org/jira/browse/SPARK-3895
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Reporter: sjk

 such as https://github.com/apache/spark/pull/2734
 {code:title=core/src/main/scala/org/apache/spark/Aggregator.scala|borderStyle=solid}
 // for example
   def combineCombinersByKey(iter: Iterator[_ : Product2[K, C]], context: 
 TaskContext)
   : Iterator[(K, C)] =
   {
 ...
   def combineValuesByKey(iter: Iterator[_ : Product2[K, V]],
  context: TaskContext): Iterator[(K, C)] = {
 {code}
 there are not conform to the 
 rule.https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
 there are so much code like this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3781) code style format

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-3781.
--
Resolution: Won't Fix

Given recent discussion, and consensus to not make sweeping style changes, I 
think this is WontFix.

 code style format
 -

 Key: SPARK-3781
 URL: https://issues.apache.org/jira/browse/SPARK-3781
 Project: Spark
  Issue Type: Improvement
Reporter: sjk





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3896) checkSpeculatableTasks fask quit loop, invoking checkSpeculatableTasks is expensive

2014-10-13 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169406#comment-14169406
 ] 

Josh Rosen commented on SPARK-3896:
---

[~srowen] There actually IS a PR; it looks like the automatic PR linking script 
was broken for a couple of days, which is why it wasn't automatically linked 
here.  However, I'm still confused even after looking at the PR (see my 
comments over there): https://github.com/apache/spark/pull/2751

 checkSpeculatableTasks fask quit loop, invoking checkSpeculatableTasks is 
 expensive
 ---

 Key: SPARK-3896
 URL: https://issues.apache.org/jira/browse/SPARK-3896
 Project: Spark
  Issue Type: Improvement
Reporter: sjk





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3890) remove redundant spark.executor.memory in doc

2014-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169405#comment-14169405
 ] 

Sean Owen commented on SPARK-3890:
--

For some reason the PR was not linked:
https://github.com/apache/spark/pull/2745

 remove redundant spark.executor.memory in doc
 -

 Key: SPARK-3890
 URL: https://issues.apache.org/jira/browse/SPARK-3890
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: WangTaoTheTonic
Priority: Minor

 Seems like there is a redundant spark.executor.memory config item in docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3896) checkSpeculatableTasks fask quit loop, invoking checkSpeculatableTasks is expensive

2014-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169409#comment-14169409
 ] 

Sean Owen commented on SPARK-3896:
--

Oops, my bad. I just realized that some PRs didn't link after looking at other 
recent JIRAs.

 checkSpeculatableTasks fask quit loop, invoking checkSpeculatableTasks is 
 expensive
 ---

 Key: SPARK-3896
 URL: https://issues.apache.org/jira/browse/SPARK-3896
 Project: Spark
  Issue Type: Improvement
Reporter: sjk





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3896) checkSpeculatableTasks fask quit loop, invoking checkSpeculatableTasks is expensive

2014-10-13 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169412#comment-14169412
 ] 

Josh Rosen commented on SPARK-3896:
---

I moved the automatic linking code from Jenkins to my PR review board platform, 
so hopefully it should be more reliable now: 
https://github.com/databricks/spark-pr-dashboard/commit/9b1487cce315fe991d7081a1bae5fc1103f020a5

 checkSpeculatableTasks fask quit loop, invoking checkSpeculatableTasks is 
 expensive
 ---

 Key: SPARK-3896
 URL: https://issues.apache.org/jira/browse/SPARK-3896
 Project: Spark
  Issue Type: Improvement
Reporter: sjk





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3662) Importing pandas breaks included pi.py example

2014-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169424#comment-14169424
 ] 

Sean Owen commented on SPARK-3662:
--

[~esamanas] Do you have a suggested change here, beyond just disambiguating 
imports in your example? Or a different example that doesn't involve import 
collision? It sounds like the modified example is then misunderstood to refer 
to a pandas random class, not the Python one, and that is simply a matter of 
namespace collision, and why pandas is dragged in. This example seems to fall 
down before it demonstrates anything else.

 Importing pandas breaks included pi.py example
 --

 Key: SPARK-3662
 URL: https://issues.apache.org/jira/browse/SPARK-3662
 Project: Spark
  Issue Type: Bug
  Components: PySpark, YARN
Affects Versions: 1.1.0
 Environment: Xubuntu 14.04.  Yarn cluster running on Ubuntu 12.04.
Reporter: Evan Samanas

 If I add import pandas at the top of the included pi.py example and submit 
 using spark-submit --master yarn-client, I get this stack trace:
 {code}
 Traceback (most recent call last):
   File /home/evan/pub_src/spark-1.1.0/examples/src/main/python/pi.py, line 
 39, in module
 count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
   File /home/evan/pub_src/spark/python/pyspark/rdd.py, line 759, in reduce
 vals = self.mapPartitions(func).collect()
   File /home/evan/pub_src/spark/python/pyspark/rdd.py, line 723, in collect
 bytesInJava = self._jrdd.collect().iterator()
   File 
 /home/evan/pub_src/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py,
  line 538, in __call__
   File 
 /home/evan/pub_src/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, 
 line 300, in get_return_value
 py4j.protocol.Py4JJavaError14/09/23 15:51:58 INFO TaskSetManager: Lost task 
 2.3 in stage 0.0 (TID 10) on executor SERVERNAMEREMOVED: 
 org.apache.spark.api.python.PythonException (Traceback (most recent call 
 last):
   File 
 /yarn/nm/usercache/evan/filecache/173/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/worker.py,
  line 75, in main
 command = pickleSer._read_with_length(infile)
   File 
 /yarn/nm/usercache/evan/filecache/173/spark-assembly-1.1.0-hadoop2.3.0-cdh5.1.0.jar/pyspark/serializers.py,
  line 150, in _read_with_length
 return self.loads(obj)
 ImportError: No module named algos
 {code}
 The example works fine if I move the statement from random import random 
 from the top and into the function (def f(_)) defined in the example.  Near 
 as I can tell, random is getting confused with a function of the same name 
 within pandas.algos.  
 Submitting the same script using --master local works, but gives a 
 distressing amount of random characters to stdout or stderr and messes up my 
 terminal:
 {code}
 ...
 @J@J@J@J@J@J@J@J@J@J@J@J@J@JJ@J@J@J@J 
 @J!@J@J#@J$@J%@J@J'@J(@J)@J*@J+@J,@J-@J.@J/@J0@J1@J2@J3@J4@J5@J6@J7@J8@J9@J:@J;@J@J=@J@J?@J@@JA@JB@JC@JD@JE@JF@JG@JH@JI@JJ@JK@JL@JM@JN@JO@JP@JQ@JR@JS@JT@JU@JV@JW@JX@JY@JZ@J[@J\@J]@J^@J_@J`@Ja@Jb@Jc@Jd@Je@Jf@Jg@Jh@Ji@Jj@Jk@Jl@Jm@Jn@Jo@Jp@Jq@Jr@Js@Jt@Ju@Jv@Jw@Jx@Jy@Jz@J{@J|@J}@J~@J@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JJJ�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@�@J�@J�@J�@J�@J�@J�@J�@J�@J�@J�@JAJAJAJAJAJAJAJAAJ
AJ
 AJ
   AJ
 AJAJAJAJAJAJAJAJAJAJAJAJAJAJJAJAJAJAJ 
 AJ!AJAJ#AJ$AJ%AJAJ'AJ(AJ)AJ*AJ+AJ,AJ-AJ.AJ/AJ0AJ1AJ2AJ3AJ4AJ5AJ6AJ7AJ8AJ9AJ:AJ;AJAJ=AJAJ?AJ@AJAAJBAJCAJDAJEAJFAJGAJHAJIAJJAJKAJLAJMAJNAJOAJPAJQAJRAJSAJTAJUAJVAJWAJXAJYAJZAJ[AJ\AJ]AJ^AJ_AJ`AJaAJbAJcAJdAJeAJfAJgAJhAJiAJjAJkAJlAJmAJnAJoAJpAJqAJrAJsAJtAJuAJvAJwAJxAJyAJzAJ{AJ|AJ}AJ~AJAJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJJJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�A14/09/23
  15:42:09 INFO SparkContext: Job finished: reduce at 
 /home/evan/pub_src/spark-1.1.0/examples/src/main/python/pi_sframe.py:38, took 
 11.276879779 s
 J�AJ�AJ�AJ�AJ�AJ�AJ�A�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJ�AJBJBJBJBJBJBJBJBBJ
  BJ
 BJ
   BJ
 BJBJBJBJBJBJBJBJBJBJBJBJBJBJJBJBJBJBJ 
 BJ!BJBJ#BJ$BJ%BJBJ'BJ(BJ)BJ*BJ+BJ,BJ-BJ.BJ/BJ0BJ1BJ2BJ3BJ4BJ5BJ6BJ7BJ8BJ9BJ:BJ;BJBJ=BJBJ?BJ@Be.
 �]qJ#1a.
 �]qJX4a.
 �]qJX4a.
 �]qJ#1a.
 �]qJX4a.
 �]qJX4a.
 �]qJ#1a.
 �]qJX4a.
 �]qJX4a.
 �]qJa.
 Pi is roughly 3.146136
 {code}
 No idea if that's related, but thought I'd include it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Resolved] (SPARK-3506) 1.1.0-SNAPSHOT in docs for 1.1.0 under docs/latest

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-3506.
--
   Resolution: Fixed
Fix Version/s: 1.1.1

Looks like the site has been updated, and I see no SNAPSHOT on the page.

 1.1.0-SNAPSHOT in docs for 1.1.0 under docs/latest
 --

 Key: SPARK-3506
 URL: https://issues.apache.org/jira/browse/SPARK-3506
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.1.0
Reporter: Jacek Laskowski
Assignee: Patrick Wendell
Priority: Trivial
 Fix For: 1.1.1


 In https://spark.apache.org/docs/latest/ there are references to 
 1.1.0-SNAPSHOT:
 * This documentation is for Spark version 1.1.0-SNAPSHOT.
 * For the Scala API, Spark 1.1.0-SNAPSHOT uses Scala 2.10.
 It should be version 1.1.0 since that's the latest released version and the 
 header tells so, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3927) Extends SPARK-2577 to fix secondary resources

2014-10-13 Thread Ian O Connell (JIRA)
Ian O Connell created SPARK-3927:


 Summary: Extends SPARK-2577 to fix secondary resources
 Key: SPARK-3927
 URL: https://issues.apache.org/jira/browse/SPARK-3927
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Ian O Connell


SPARK-2577 was a partial fix,  handling the case of the assembly + app jar. The 
additional resources however would run into the same issue.

I have the super simple PR ready. Though should this code be moved inside the 
addResource method instead to address it more globally? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3251) Clarify learning interfaces

2014-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169449#comment-14169449
 ] 

Sean Owen commented on SPARK-3251:
--

Is this a subset of / duplicate of SPARK-3702 now, given the discussion?

  Clarify learning interfaces
 

 Key: SPARK-3251
 URL: https://issues.apache.org/jira/browse/SPARK-3251
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.1.0, 1.1.1
Reporter: Christoph Sawade

 *Make threshold mandatory*
 Currently, the output of predict for an example is either the score
 or the class. This side-effect is caused by clearThreshold. To
 clarify that behaviour three different types of predict (predictScore,
 predictClass, predictProbabilty) were introduced; the threshold is not
 longer optional.
 *Clarify classification interfaces*
 Currently, some functionality is spreaded over multiple models.
 In order to clarify the structure and simplify the implementation of
 more complex models (like multinomial logistic regression), two new
 classes are introduced:
 - BinaryClassificationModel: for all models that derives a binary 
 classification from a single weight vector. Comprises the tresholding 
 functionality to derive a prediction from a score. It basically captures 
 SVMModel and LogisticRegressionModel.
 - ProbabilitistClassificaitonModel: This trait defines the interface for 
 models that return a calibrated confidence score (aka probability).
 *Misc*
 - some renaming
 - add test for probabilistic output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3897) Scala style: format example code

2014-10-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169451#comment-14169451
 ] 

Apache Spark commented on SPARK-3897:
-

User 'shijinkui' has created a pull request for this issue:
https://github.com/apache/spark/pull/2754

 Scala style: format example code
 

 Key: SPARK-3897
 URL: https://issues.apache.org/jira/browse/SPARK-3897
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Reporter: sjk

 https://github.com/apache/spark/pull/2754



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3883) Provide SSL support for Akka and HttpServer based connections

2014-10-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169458#comment-14169458
 ] 

Apache Spark commented on SPARK-3883:
-

User 'jacek-lewandowski' has created a pull request for this issue:
https://github.com/apache/spark/pull/2739

 Provide SSL support for Akka and HttpServer based connections
 -

 Key: SPARK-3883
 URL: https://issues.apache.org/jira/browse/SPARK-3883
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Jacek Lewandowski

 Spark uses at least 4 logical communication channels:
 1. Control messages - Akka based
 2. JARs and other files - Jetty based (HttpServer)
 3. Computation results - Java NIO based
 4. Web UI - Jetty based
 The aim of this feature is to enable SSL for (1) and (2).
 Why:
 Spark configuration is sent through (1). Spark configuration may contain 
 sensitive information like credentials for accessing external data sources or 
 streams. Application JAR files (2) may include the application logic and 
 therefore they may include information about the structure of the external 
 data sources, and credentials as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3890) remove redundant spark.executor.memory in doc

2014-10-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169466#comment-14169466
 ] 

Apache Spark commented on SPARK-3890:
-

User 'WangTaoTheTonic' has created a pull request for this issue:
https://github.com/apache/spark/pull/2745

 remove redundant spark.executor.memory in doc
 -

 Key: SPARK-3890
 URL: https://issues.apache.org/jira/browse/SPARK-3890
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: WangTaoTheTonic
Priority: Minor

 Seems like there is a redundant spark.executor.memory config item in docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-3921) WorkerWatcher in Standalone mode fail to come up due to invalid workerUrl

2014-10-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-3921:
-
Comment: was deleted

(was: https://github.com/apache/spark/pull/2779)

 WorkerWatcher in Standalone mode fail to come up due to invalid workerUrl
 -

 Key: SPARK-3921
 URL: https://issues.apache.org/jira/browse/SPARK-3921
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Aaron Davidson
Assignee: Aaron Davidson
Priority: Critical

 As of [this 
 commit|https://github.com/apache/spark/commit/79e45c9323455a51f25ed9acd0edd8682b4bbb88#diff-79391110e9f26657e415aa169a004998R153],
  standalone mode appears to have lost its WorkerWatcher, because of the 
 swapped workerUrl and appId parameters. We still put workerUrl before appId 
 when we start standalone executors, and the Executor misinterprets the appId 
 as the workerUrl and fails to create the WorkerWatcher.
 Note that this does not seem to crash the Standalone executor mode, despite 
 the failing of the WorkerWatcher during its constructor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3480) Throws out Not a valid command 'yarn-alpha/scalastyle' in dev/scalastyle for sbt build tool during 'Running Scala style checks'

2014-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169472#comment-14169472
 ] 

Sean Owen commented on SPARK-3480:
--

Given the discussion I suggest this is CannotReproduce?

 Throws out Not a valid command 'yarn-alpha/scalastyle' in dev/scalastyle for 
 sbt build tool during 'Running Scala style checks'
 ---

 Key: SPARK-3480
 URL: https://issues.apache.org/jira/browse/SPARK-3480
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Yi Zhou
Priority: Minor

 Symptom:
 Run ./dev/run-tests and dump outputs as following:
 SBT_MAVEN_PROFILES_ARGS=-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 
 -Pkinesis-asl
 [Warn] Java 8 tests will not run because JDK version is  1.8.
 =
 Running Apache RAT checks
 =
 RAT checks passed.
 =
 Running Scala style checks
 =
 Scalastyle checks failed at following occurrences:
 [error] Expected ID character
 [error] Not a valid command: yarn-alpha
 [error] Expected project ID
 [error] Expected configuration
 [error] Expected ':' (if selecting a configuration)
 [error] Expected key
 [error] Not a valid key: yarn-alpha
 [error] yarn-alpha/scalastyle
 [error]   ^
 Possible Cause:
 I checked the dev/scalastyle, found that there are 2 parameters 
 'yarn-alpha/scalastyle' and 'yarn/scalastyle' separately,like
 echo -e q\n | sbt/sbt -Pyarn -Phadoop-0.23 -Dhadoop.version=0.23.9 
 yarn-alpha/scalastyle \
scalastyle.txt
 echo -e q\n | sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 
 yarn/scalastyle \
scalastyle.txt
 From above error message, sbt seems to complain them due to '/' separator. So 
 it can be run through after  I manually modified original ones to  
 'yarn-alpha:scalastyle' and 'yarn:scalastyle'..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3257) Enable :cp to add JARs in spark-shell (Scala 2.11)

2014-10-13 Thread Heather Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169479#comment-14169479
 ] 

Heather Miller commented on SPARK-3257:
---

FYI to Typesafers, I'm about to PR this to scala/scala (sometime today)

 Enable :cp to add JARs in spark-shell (Scala 2.11)
 --

 Key: SPARK-3257
 URL: https://issues.apache.org/jira/browse/SPARK-3257
 Project: Spark
  Issue Type: New Feature
  Components: Spark Shell
Reporter: Matei Zaharia
Assignee: Heather Miller





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2633) enhance spark listener API to gather more spark job information

2014-10-13 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169499#comment-14169499
 ] 

Josh Rosen commented on SPARK-2633:
---

I've opened a pull request to add a stable pull-based progress / status API to 
Spark and would love to receive your feedback: 
https://github.com/apache/spark/pull/2696

 enhance spark listener API to gather more spark job information
 ---

 Key: SPARK-2633
 URL: https://issues.apache.org/jira/browse/SPARK-2633
 Project: Spark
  Issue Type: New Feature
  Components: Java API
Reporter: Chengxiang Li
Priority: Critical
  Labels: hive
 Attachments: Spark listener enhancement for Hive on Spark job monitor 
 and statistic.docx


 Based on Hive on Spark job status monitoring and statistic collection 
 requirement, try to enhance spark listener API to gather more spark job 
 information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3902) Stabilize AsyncRDDActions and expose its methods in Java API

2014-10-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169505#comment-14169505
 ] 

Apache Spark commented on SPARK-3902:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/2760

 Stabilize AsyncRDDActions and expose its methods in Java API
 

 Key: SPARK-3902
 URL: https://issues.apache.org/jira/browse/SPARK-3902
 Project: Spark
  Issue Type: New Feature
  Components: Java API, Spark Core
Reporter: Josh Rosen
Assignee: Josh Rosen

 The AsyncRDDActions methods are currently the easiest way to determine Spark 
 jobs' ids for use in progress-monitoring code (see SPARK-2636).  
 AsyncRDDActions is currently marked as {{@Experimental}}; for 1.2, I think 
 that we should stabilize this API and expose it in Java, too.
 One concern is whether there's a better async API design that we should 
 prefer over this one as our stable API; I had some ideas for a more general 
 API in SPARK-3626 (discussed in much greater detail on GitHub: 
 https://github.com/apache/spark/pull/2482) but decided against the more 
 general API due to its confusing cancellation semantics.  Given this, I'd be 
 comfortable stabilizing our current API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3590) Expose async APIs in the Java API

2014-10-13 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169507#comment-14169507
 ] 

Josh Rosen commented on SPARK-3590:
---

I've opened a pull request to add these Java APIs: 
https://github.com/apache/spark/pull/2760

 Expose async APIs in the Java API
 -

 Key: SPARK-3590
 URL: https://issues.apache.org/jira/browse/SPARK-3590
 Project: Spark
  Issue Type: New Feature
  Components: Java API
Reporter: Marcelo Vanzin

 Currently, a single async method is exposed through the Java API 
 (JavaRDDLike::foreachAsync). That method returns a Scala future 
 (FutureAction).
 We should bring the Java API up to sync with the Scala async APIs, and also 
 expose Java-friendly types (e.g. a proper java.util.concurrent.Future).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3924) Upgrade to Akka version 2.3.6

2014-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169504#comment-14169504
 ] 

Sean Owen commented on SPARK-3924:
--

I think this is a duplicate of SPARK-2707 and SPARK-2805.

 Upgrade to Akka version 2.3.6
 -

 Key: SPARK-3924
 URL: https://issues.apache.org/jira/browse/SPARK-3924
 Project: Spark
  Issue Type: Dependency upgrade
 Environment: deploy env
Reporter: Helena Edelson

 I tried every sbt in the book but can't use the latest Akka version in my 
 project with Spark. It would be great if I could.
 Also I can not use the latest Typesafe Config - 1.2.1, which would also be 
 great.
 See https://issues.apache.org/jira/browse/SPARK-2593
 This is a big change. If I have time I can do a PR.
 [~helena_e] 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2707) Upgrade to Akka 2.3

2014-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169509#comment-14169509
 ] 

Sean Owen commented on SPARK-2707:
--

Can this be considered a duplicate of SPARK-2805, since that's where I see 
recent action?

 Upgrade to Akka 2.3
 ---

 Key: SPARK-2707
 URL: https://issues.apache.org/jira/browse/SPARK-2707
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Yardena

 Upgrade Akka from 2.2 to 2.3. We want to be able to use new Akka and Spray 
 features directly in the same project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3590) Expose async APIs in the Java API

2014-10-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169506#comment-14169506
 ] 

Apache Spark commented on SPARK-3590:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/2760

 Expose async APIs in the Java API
 -

 Key: SPARK-3590
 URL: https://issues.apache.org/jira/browse/SPARK-3590
 Project: Spark
  Issue Type: New Feature
  Components: Java API
Reporter: Marcelo Vanzin

 Currently, a single async method is exposed through the Java API 
 (JavaRDDLike::foreachAsync). That method returns a Scala future 
 (FutureAction).
 We should bring the Java API up to sync with the Scala async APIs, and also 
 expose Java-friendly types (e.g. a proper java.util.concurrent.Future).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1834) NoSuchMethodError when invoking JavaPairRDD.reduce() in Java

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1834.
--
Resolution: Duplicate

On another look, I'm almost sure this is the same issue as in SPARK-3266, which 
[~joshrosen] has been looking at.

 NoSuchMethodError when invoking JavaPairRDD.reduce() in Java
 

 Key: SPARK-1834
 URL: https://issues.apache.org/jira/browse/SPARK-1834
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.1
 Environment: Redhat Linux, Java 7, Hadoop 2.2, Scala 2.10.4
Reporter: John Snodgrass

 I get a java.lang.NoSuchMethod error when invoking JavaPairRDD.reduce(). Here 
 is the partial stack trace:
 Exception in thread main java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at 
 org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:39)
 at 
 org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
 Caused by: java.lang.NoSuchMethodError: 
 org.apache.spark.api.java.JavaPairRDD.reduce(Lorg/apache/spark/api/java/function/Function2;)Lscala/Tuple2;
 at JavaPairRDDReduceTest.main(JavaPairRDDReduceTest.java:49)...
 I'm using Spark 0.9.1. I checked to ensure that I'm compiling with the same 
 version of Spark as I am running on the cluster. The reduce() method works 
 fine with JavaRDD, just not with JavaPairRDD. Here is a code snippet that 
 exhibits the problem: 
   ArrayListInteger array = new ArrayList();
   for (int i = 0; i  10; ++i) {
 array.add(i);
   }
   JavaRDDInteger rdd = javaSparkContext.parallelize(array);
   JavaPairRDDString, Integer testRDD = rdd.map(new 
 PairFunctionInteger, String, Integer() {
 @Override
 public Tuple2String, Integer call(Integer t) throws Exception {
   return new Tuple2( + t, t);
 }
   }).cache();
   
   testRDD.reduce(new Function2Tuple2String, Integer, Tuple2String, 
 Integer, Tuple2String, Integer() {
 @Override
 public Tuple2String, Integer call(Tuple2String, Integer arg0, 
 Tuple2String, Integer arg1) throws Exception { 
   return new Tuple2(arg0._1 + arg1._1, arg0._2 * 10 + arg0._2);
 }
   });



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2493) SBT gen-idea doesn't generate correct Intellij project

2014-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169527#comment-14169527
 ] 

Sean Owen commented on SPARK-2493:
--

Is this still an issue [~dbtsai] ? For IntelliJ, I find it much easier to point 
directly at the Maven build, and that's more the primary build system now 
anyway.

 SBT gen-idea doesn't generate correct Intellij project
 --

 Key: SPARK-2493
 URL: https://issues.apache.org/jira/browse/SPARK-2493
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Reporter: DB Tsai

 I've a clean clone of spark master repository, and I generated the
 intellij project file by sbt gen-idea as usual. There are two issues
 we have after merging SPARK-1776 (read dependencies from Maven).
 1) After SPARK-1776, sbt gen-idea will download the dependencies from
 internet even those jars are in local cache. Before merging, the
 second time we run gen-idea will not download anything but use the
 jars in cache.
 2) The tests with spark local context can not be run in the intellij.
 It will show the following exception.
 The current workaround we've are checking out any snapshot before
 merging to gen-idea, and then switch back to current master. But this
 will not work when the master deviate too much from the latest working
 snapshot.
 [ERROR] [07/14/2014 16:27:49.967] [ScalaTest-run] [Remoting] Remoting
 error: [Startup timed out] [
 akka.remote.RemoteTransportException: Startup timed out
 at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129)
 at akka.remote.Remoting.start(Remoting.scala:191)
 at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
 at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579)
 at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577)
 at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588)
 at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
 at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
 at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:104)
 at org.apache.spark.SparkEnv$.create(SparkEnv.scala:153)
 at org.apache.spark.SparkContext.init(SparkContext.scala:202)
 at org.apache.spark.SparkContext.init(SparkContext.scala:117)
 at org.apache.spark.SparkContext.init(SparkContext.scala:132)
 at 
 org.apache.spark.mllib.util.LocalSparkContext$class.beforeAll(LocalSparkContext.scala:29)
 at 
 org.apache.spark.mllib.optimization.LBFGSSuite.beforeAll(LBFGSSuite.scala:27)
 at 
 org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
 at 
 org.apache.spark.mllib.optimization.LBFGSSuite.beforeAll(LBFGSSuite.scala:27)
 at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
 at org.apache.spark.mllib.optimization.LBFGSSuite.run(LBFGSSuite.scala:27)
 at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
 at 
 org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
 at 
 org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
 at 
 org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
 at 
 org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
 at 
 org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
 at 
 org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
 at org.scalatest.tools.Runner$.run(Runner.scala:883)
 at org.scalatest.tools.Runner.run(Runner.scala)
 at 
 org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:141)
 at 
 org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:32)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
 Caused by: java.util.concurrent.TimeoutException: Futures timed out
 after [1 milliseconds]
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
 at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
 at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
 at 
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
 at scala.concurrent.Await$.result(package.scala:107)
 at akka.remote.Remoting.start(Remoting.scala:173)
 ... 35 more
 ]
 An exception or error caused a 

[jira] [Resolved] (SPARK-2198) Partition the scala build file so that it is easier to maintain

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-2198.
--
Resolution: Won't Fix

Sounds like a WontFix

 Partition the scala build file so that it is easier to maintain
 ---

 Key: SPARK-2198
 URL: https://issues.apache.org/jira/browse/SPARK-2198
 Project: Spark
  Issue Type: Task
  Components: Build
Reporter: Helena Edelson
Priority: Minor
   Original Estimate: 3h
  Remaining Estimate: 3h

 Partition to standard Dependencies, Version, Settings, Publish.scala. keeping 
 the SparkBuild clean to describe the modules and their deps so that changes 
 in versions, for example, need only be made in Version.scala, settings 
 changes such as in scalac in Settings.scala, etc.
 I'd be happy to do this ([~helena_e])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2593) Add ability to pass an existing Akka ActorSystem into Spark

2014-10-13 Thread Evan Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169545#comment-14169545
 ] 

Evan Chan commented on SPARK-2593:
--

Hmmm  :(   I believe Spark already uses a shaded version of Akka
with a different namespace.   Unfortunately it still creates some
dependency conflicts down the chain, but I don't remember the details.

On Mon, Oct 13, 2014 at 4:58 AM, Helena Edelson (JIRA) j...@apache.org




-- 
The fruit of silence is prayer;
the fruit of prayer is faith;
the fruit of faith is love;
the fruit of love is service;
the fruit of service is peace.  -- Mother Teresa


 Add ability to pass an existing Akka ActorSystem into Spark
 ---

 Key: SPARK-2593
 URL: https://issues.apache.org/jira/browse/SPARK-2593
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Helena Edelson

 As a developer I want to pass an existing ActorSystem into StreamingContext 
 in load-time so that I do not have 2 actor systems running on a node in an 
 Akka application.
 This would mean having spark's actor system on its own named-dispatchers as 
 well as exposing the new private creation of its own actor system.
   
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1849) Broken UTF-8 encoded data gets character replacements and thus can't be fixed

2014-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169549#comment-14169549
 ] 

Sean Owen commented on SPARK-1849:
--

Yes, I think there isn't a 'fix' here short of a quite different 
implementation. Hadoop's text support pretty deeply assumes UTF-8 (partly for 
speed) and the Spark implementation is just Hadoop's. This would have to 
justify rewriting all that. I think you have to treat this as binary data for 
now.

 Broken UTF-8 encoded data gets character replacements and thus can't be 
 fixed
 ---

 Key: SPARK-1849
 URL: https://issues.apache.org/jira/browse/SPARK-1849
 Project: Spark
  Issue Type: Bug
Reporter: Harry Brundage
 Attachments: encoding_test


 I'm trying to process a file which isn't valid UTF-8 data inside hadoop using 
 Spark via {{sc.textFile()}}. Is this possible, and if not, is this a bug that 
 we should fix? It looks like {{HadoopRDD}} uses 
 {{org.apache.hadoop.io.Text.toString}} on all the data it ever reads, which I 
 believe replaces invalid UTF-8 byte sequences with the UTF-8 replacement 
 character, \uFFFD. Some example code mimicking what {{sc.textFile}} does 
 underneath:
 {code}
 scala sc.textFile(path).collect()(0)
 res8: String = ?pple
 scala sc.hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], 
 classOf[Text]).map(pair = pair._2.toString).collect()(0).getBytes()
 res9: Array[Byte] = Array(-17, -65, -67, 112, 112, 108, 101)
 scala sc.hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], 
 classOf[Text]).map(pair = pair._2.getBytes).collect()(0)
 res10: Array[Byte] = Array(-60, 112, 112, 108, 101)
 {code}
 In the above example, the first two snippets show the string representation 
 and byte representation of the example line of text. The string shows a 
 question mark for the replacement character and the bytes reveal the 
 replacement character has been swapped in by {{Text.toString}}. The third 
 snippet shows what happens if you call {{getBytes}} on the {{Text}} object 
 which comes back from hadoop land: we get the real bytes in the file out.
 Now, I think this is a bug, though you may disagree. The text inside my file 
 is perfectly valid iso-8859-1 encoded bytes, which I would like to be able to 
 rescue and re-encode into UTF-8, because I want my application to be smart 
 like that. I think Spark should give me the raw broken string so I can 
 re-encode, but I can't get at the original bytes in order to guess at what 
 the source encoding might be, as they have already been replaced. I'm dealing 
 with data from some CDN access logs which are to put it nicely diversely 
 encoded, but I think a use case Spark should fully support. So, my suggested 
 fix, which I'd like some guidance, is to change {{textFile}} to spit out 
 broken strings by not using {{Text}}'s UTF-8 encoding.
 Further compounding this issue is that my application is actually in PySpark, 
 but we can talk about how bytes fly through to Scala land after this if we 
 agree that this is an issue at all. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1787) Build failure on JDK8 :: SBT fails to load build configuration file

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1787.
--
Resolution: Duplicate

FWIW SBT + Java 8 has worked fine for me on master for a long while, so assume 
this does not affect 1.1 or perhaps 1.0.

 Build failure on JDK8 :: SBT fails to load build configuration file
 ---

 Key: SPARK-1787
 URL: https://issues.apache.org/jira/browse/SPARK-1787
 Project: Spark
  Issue Type: New Feature
  Components: Build
Affects Versions: 0.9.0
 Environment: JDK8
 Scala 2.10.X
 SBT 0.12.X
Reporter: Richard Gomes
Priority: Minor

 SBT fails to build under JDK8.
 Please find steps to reproduce the error below:
 (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ uname -a
 Linux terra 3.13-1-amd64 #1 SMP Debian 3.13.10-1 (2014-04-15) x86_64 GNU/Linux
 (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ java -version
 java version 1.8.0_05
 Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)
 (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ scala -version
 Scala code runner version 2.10.3 -- Copyright 2002-2013, LAMP/EPFL
 (j8s10)rgomes@terra:~/workspace/spark-0.9.1$ sbt/sbt clean
 Launching sbt from sbt/sbt-launch-0.12.4.jar
 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=350m; 
 support was removed in 8.0
 [info] Loading project definition from 
 /home/rgomes/workspace/spark-0.9.1/project/project
 [info] Compiling 1 Scala source to 
 /home/rgomes/workspace/spark-0.9.1/project/project/target/scala-2.9.2/sbt-0.12/classes...
 [error] error while loading CharSequence, class file 
 '/opt/developer/jdk1.8.0_05/jre/lib/rt.jar(java/lang/CharSequence.class)' is 
 broken
 [error] (bad constant pool tag 15 at byte 1501)
 [error] error while loading Comparator, class file 
 '/opt/developer/jdk1.8.0_05/jre/lib/rt.jar(java/util/Comparator.class)' is 
 broken
 [error] (bad constant pool tag 15 at byte 5003)
 [error] two errors found
 [error] (compile:compile) Compilation failed
 Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1738) Is spark-debugger still available?

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1738.
--
Resolution: Fixed

That document was since deleted at some point anyway, and I assume the answer 
is that it does not exit.

 Is spark-debugger still available?
 --

 Key: SPARK-1738
 URL: https://issues.apache.org/jira/browse/SPARK-1738
 Project: Spark
  Issue Type: Question
  Components: Documentation
Reporter: WangTaoTheTonic
Priority: Minor

 I see the arthur branch(https://github.com/apache/spark/tree/arthur) 
 described in docs/spark-debugger.md does not exist.
 So the spark-debugger is still available? If not, should the document be 
 deleted?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1605) Improve mllib.linalg.Vector

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1605.
--
Resolution: Won't Fix

Another WontFix then?

 Improve mllib.linalg.Vector
 ---

 Key: SPARK-1605
 URL: https://issues.apache.org/jira/browse/SPARK-1605
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Sandeep Singh

 We can make current Vector a wrapper around Breeze.linalg.Vector ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1573) slight modification with regards to sbt/sbt test

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1573.
--
Resolution: Won't Fix

This has been resolved insofar as the main README.md no longer has this text.

 slight modification with regards to sbt/sbt test
 

 Key: SPARK-1573
 URL: https://issues.apache.org/jira/browse/SPARK-1573
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Reporter: Nishkam Ravi

 When the sources are built against a certain Hadoop version with 
 SPARK_YARN=true, the same settings seem necessary when running sbt/sbt test. 
 For example:
 SPARK_HADOOP_VERSION=2.3.0-cdh5.0.0 SPARK_YARN=true sbt/sbt assembly
 SPARK_HADOOP_VERSION=2.3.0-cdh5.0.0 SPARK_YARN=true sbt/sbt test
 Otherwise build errors and failing tests are seen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3873) Scala style: check import ordering

2014-10-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169572#comment-14169572
 ] 

Apache Spark commented on SPARK-3873:
-

User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/2757

 Scala style: check import ordering
 --

 Key: SPARK-3873
 URL: https://issues.apache.org/jira/browse/SPARK-3873
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Reporter: Reynold Xin
Assignee: Marcelo Vanzin





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3928) Support wildcard matches on Parquet files

2014-10-13 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-3928:
---

 Summary: Support wildcard matches on Parquet files
 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Priority: Minor


{{SparkContext.textFile()}} supports patterns like {{part-*}} and 
{{2014-\?\?-\?\?}}. 

It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1479) building spark on 2.0.0-cdh4.4.0 failed

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1479.
--
Resolution: Won't Fix

Given discussion in SPARK-3445, I doubt anything more will be done for YARN 
alpha support, as it's on its way out.

 building spark on 2.0.0-cdh4.4.0 failed
 ---

 Key: SPARK-1479
 URL: https://issues.apache.org/jira/browse/SPARK-1479
 Project: Spark
  Issue Type: Question
 Environment: 2.0.0-cdh4.4.0
 Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
 spark 0.9.1
 java version 1.6.0_32
Reporter: jackielihf
 Attachments: mvn.log


 [INFO] 
 
 [ERROR] Failed to execute goal 
 net.alchim31.maven:scala-maven-plugin:3.1.5:compile (scala-compile-first) on 
 project spark-yarn-alpha_2.10: Execution scala-compile-first of goal 
 net.alchim31.maven:scala-maven-plugin:3.1.5:compile failed. CompileFailed - 
 [Help 1]
 org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
 goal net.alchim31.maven:scala-maven-plugin:3.1.5:compile 
 (scala-compile-first) on project spark-yarn-alpha_2.10: Execution 
 scala-compile-first of goal 
 net.alchim31.maven:scala-maven-plugin:3.1.5:compile failed.
   at 
 org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:225)
   at 
 org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
   at 
 org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
   at 
 org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
   at 
 org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
   at 
 org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
   at 
 org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
   at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320)
   at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
   at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
   at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
   at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
   at 
 org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
   at 
 org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409)
   at 
 org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352)
 Caused by: org.apache.maven.plugin.PluginExecutionException: Execution 
 scala-compile-first of goal 
 net.alchim31.maven:scala-maven-plugin:3.1.5:compile failed.
   at 
 org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:110)
   at 
 org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
   ... 19 more
 Caused by: Compilation failed
   at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:76)
   at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:35)
   at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:29)
   at 
 sbt.compiler.AggressiveCompile$$anonfun$4$$anonfun$compileScala$1$1.apply$mcV$sp(AggressiveCompile.scala:71)
   at 
 sbt.compiler.AggressiveCompile$$anonfun$4$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:71)
   at 
 sbt.compiler.AggressiveCompile$$anonfun$4$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:71)
   at 
 sbt.compiler.AggressiveCompile.sbt$compiler$AggressiveCompile$$timed(AggressiveCompile.scala:101)
   at 
 sbt.compiler.AggressiveCompile$$anonfun$4.compileScala$1(AggressiveCompile.scala:70)
   at 
 sbt.compiler.AggressiveCompile$$anonfun$4.apply(AggressiveCompile.scala:88)
   at 
 sbt.compiler.AggressiveCompile$$anonfun$4.apply(AggressiveCompile.scala:60)
   at 
 sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:24)
   at 
 sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:22)
   at sbt.inc.Incremental$.cycle(Incremental.scala:40)
   at sbt.inc.Incremental$.compile(Incremental.scala:25)
   at sbt.inc.IncrementalCompile$.apply(Compile.scala:20)
   at sbt.compiler.AggressiveCompile.compile2(AggressiveCompile.scala:96)
   at 

[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files

2014-10-13 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169580#comment-14169580
 ] 

Nicholas Chammas commented on SPARK-3928:
-

cc [~marmbrus]

 Support wildcard matches on Parquet files
 -

 Key: SPARK-3928
 URL: https://issues.apache.org/jira/browse/SPARK-3928
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Reporter: Nicholas Chammas
Priority: Minor

 {{SparkContext.textFile()}} supports patterns like {{part-*}} and 
 {{2014-\?\?-\?\?}}. 
 It would be nice if {{SparkContext.parquetFile()}} did the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1409) Flaky Test: actor input stream test in org.apache.spark.streaming.InputStreamsSuite

2014-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169589#comment-14169589
 ] 

Sean Owen commented on SPARK-1409:
--

Since this test was removed with SPARK-2805, safe to call this closed?

 Flaky Test: actor input stream test in 
 org.apache.spark.streaming.InputStreamsSuite
 -

 Key: SPARK-1409
 URL: https://issues.apache.org/jira/browse/SPARK-1409
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Michael Armbrust
Assignee: Tathagata Das

 Here are just a few cases:
 https://travis-ci.org/apache/spark/jobs/22151827
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13709/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1398) Remove FindBugs jsr305 dependency

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1398.
--
Resolution: Won't Fix

From the PR discussion, this had to be reverted because of some build 
problems, so I assume removing this .jar is a WontFix

 Remove FindBugs jsr305 dependency
 -

 Key: SPARK-1398
 URL: https://issues.apache.org/jira/browse/SPARK-1398
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Mark Hamstra
Assignee: Mark Hamstra
Priority: Minor

 We're not making much use of FindBugs at this point, but findbugs-2.0.x is a 
 drop-in replacement for 1.3.9 and does offer significant improvements 
 (http://findbugs.sourceforge.net/findbugs2.html), so it's probably where we 
 want to be for Spark 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1339) Build error: org.eclipse.paho:mqtt-client

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1339.
--
Resolution: Not a Problem

 Build error: org.eclipse.paho:mqtt-client
 -

 Key: SPARK-1339
 URL: https://issues.apache.org/jira/browse/SPARK-1339
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 0.9.0
Reporter: Ken Williams

 Using Maven, I'm unable to build the 0.9.0 distribution I just downloaded.  
 The Maven error is:
 {code}
 [ERROR] Failed to execute goal on project spark-examples_2.10: Could not 
 resolve dependencies for project 
 org.apache.spark:spark-examples_2.10:jar:0.9.0-incubating: Could not find 
 artifact org.eclipse.paho:mqtt-client:jar:0.4.0 in nexus
 {code}
 My Maven version is 3.2.1, running on Java 1.7.0, using Scala 2.10.4.
 Is there an additional Maven repository I should add or something?
 If I go into the {{pom.xml}} and comment out the {{external/mqtt}} and 
 {{examples}} modules, the build succeeds.  I'm fine without the MQTT stuff, 
 but I would really like to get the examples working because I haven't played 
 with Spark before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1317) sbt doesn't work for building Spark programs

2014-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169629#comment-14169629
 ] 

Sean Owen commented on SPARK-1317:
--

PS if you're still interested in this, I am pretty sure #1 is the correct 
answer. I would use my own sbt (or really, the SBT support in my IDE perhaps, 
or Maven) to build my own app.

 sbt doesn't work for building Spark programs
 

 Key: SPARK-1317
 URL: https://issues.apache.org/jira/browse/SPARK-1317
 Project: Spark
  Issue Type: Bug
  Components: Build, Documentation
Affects Versions: 0.9.0
Reporter: Diana Carroll

 I don't know if this is a doc bug or a product bug, because I don't know how 
 it is supposed to work.
 The Spark quick start guide page has a section that walks you through 
 creating a standalone Spark app in Scala.  I think the instructions worked 
 in 0.8.1 but I can't get them to work in 0.9.0.
 The instructions have you create a directory structure in the canonical sbt 
 format, but do not tell you where to locate this directory.  However, after 
 setting up the structure, the tutorial then instructs you to use the command 
 {code}sbt/sbt package{code}
 which implies that the working directory must be SPARK_HOME.
 I tried it both ways: creating a mysparkapp directory right in SPARK_HOME 
 and creating it in my home directory.  Neither worked, with different results:
 - if I create a mysparkapp directory as instructed in SPARK_HOME, cd to 
 SPARK_HOME and run the command sbt/sbt package as specified, it packages ALL 
 of Spark...but does not build my own app.
 - if I create a mysparkapp directory elsewhere, cd to that directory, and 
 run the command there, I get an error:
 {code}
 $SPARK_HOME/sbt/sbt package
 awk: cmd. line:1: fatal: cannot open file `./project/build.properties' for 
 reading (No such file or directory)
 Attempting to fetch sbt
 /usr/lib/spark/sbt/sbt: line 33: sbt/sbt-launch-.jar: No such file or 
 directory
 /usr/lib/spark/sbt/sbt: line 33: sbt/sbt-launch-.jar: No such file or 
 directory
 Our attempt to download sbt locally to sbt/sbt-launch-.jar failed. Please 
 install sbt manually from http://www.scala-sbt.org/
 {code}
 So, either:
 1: the Spark distribution of sbt can only be used to build Spark itself, not 
 you own code...in which case the quick start guide is wrong, and should 
 instead say that users should install sbt separately
 OR
 2: the Spark distribution of sbt CAN be used, with property configuration, in 
 which case that configuration should be documented (I wasn't able to figure 
 it out, but I didn't try that hard either)
 OR
 3: the Spark distribution of sbt is *supposed* to be able to build Spark 
 apps, but is configured incorrectly in the product, in which case there's a 
 product bug rather than a doc bug
 Although this is not a show-stopper, because the obvious workaround is to 
 simply install sbt separately, I think at least updating the docs is pretty 
 high priority, because most people learning Spark start with that Quick Start 
 page, which doesn't work.
 (If it's doc issue #1, let me know, and I'll fix the docs myself.  :-) )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3929) Support for fixed-precision decimal

2014-10-13 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3929:


 Summary: Support for fixed-precision decimal
 Key: SPARK-3929
 URL: https://issues.apache.org/jira/browse/SPARK-3929
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Matei Zaharia
Assignee: Matei Zaharia






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3930) Add precision and scale to Spark SQL's Decimal type

2014-10-13 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3930:


 Summary: Add precision and scale to Spark SQL's Decimal type
 Key: SPARK-3930
 URL: https://issues.apache.org/jira/browse/SPARK-3930
 Project: Spark
  Issue Type: Sub-task
Reporter: Matei Zaharia
Assignee: Matei Zaharia






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3929) Support for fixed-precision decimal

2014-10-13 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-3929:
-
Description: Spark SQL should support fixed-precision decimals, which are 
available in Hive 0.13 (see 
https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf)
 as well as in new versions of Parquet. This involves adding precision to the 
decimal type and implementing various rules for math on it (see above).

 Support for fixed-precision decimal
 ---

 Key: SPARK-3929
 URL: https://issues.apache.org/jira/browse/SPARK-3929
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Matei Zaharia
Assignee: Matei Zaharia

 Spark SQL should support fixed-precision decimals, which are available in 
 Hive 0.13 (see 
 https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf)
  as well as in new versions of Parquet. This involves adding precision to the 
 decimal type and implementing various rules for math on it (see above).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3931) Support reading fixed-precision decimals from Parquet

2014-10-13 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3931:


 Summary: Support reading fixed-precision decimals from Parquet
 Key: SPARK-3931
 URL: https://issues.apache.org/jira/browse/SPARK-3931
 Project: Spark
  Issue Type: Sub-task
Reporter: Matei Zaharia
Assignee: Matei Zaharia






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3932) Support reading fixed-precision decimals from Hive 0.13

2014-10-13 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3932:


 Summary: Support reading fixed-precision decimals from Hive 0.13
 Key: SPARK-3932
 URL: https://issues.apache.org/jira/browse/SPARK-3932
 Project: Spark
  Issue Type: Sub-task
Reporter: Matei Zaharia
Assignee: Matei Zaharia






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1243) spark compilation error

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1243.
--
Resolution: Fixed

This appears to be long since resolved by something else, perhaps a subsequent 
change to Jetty deps. I have never seen this personally, and Jenkins builds are 
fine.

 spark compilation error
 ---

 Key: SPARK-1243
 URL: https://issues.apache.org/jira/browse/SPARK-1243
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Qiuzhuang Lian

 After issuing git pull from git master, spark could not compile any longer
 Here is the error message, it seems that it is related to jetty upgrade.@rxin
  
  
  compile
 [info] Compiling 301 Scala sources and 19 Java sources to 
 E:\projects\amplab\spark\core\target\scala-2.10\classes...
 [warn] Class java.nio.channels.ReadPendingException not found - continuing 
 with a stub.
 [error] 
 [error]  while compiling: 
 E:\projects\amplab\spark\core\src\main\scala\org\apache\spark\HttpServer.scala
 [error] during phase: erasure
 [error]  library version: version 2.10.3
 [error] compiler version: version 2.10.3
 [error]   reconstructed args: -Xmax-classfile-name 120 -deprecation 
 -bootclasspath 
 C:\Java\jdk1.6.0_27\jre\lib\resources.jar;C:\Java\jdk1.6.0_27\jre\lib\rt.jar;C:\Java\jdk1.6.0_27\jre\lib\sunrsasign.jar;C:\Java\jdk1.6.0_27\jre\lib\jsse.jar;C:\Java\jdk1.6.0_27\jre\lib\jce.jar;C:\Java\jdk1.6.0_27\jre\lib\charsets.jar;C:\Java\jdk1.6.0_27\jre\lib\modules\jdk.boot.jar;C:\Java\jdk1.6.0_27\jre\classes;C:\Users\Kand\.sbt\boot\scala-2.10.3\lib\scala-library.jar
  -unchecked -classpath 
 

[jira] [Updated] (SPARK-3266) JavaDoubleRDD doesn't contain max()

2014-10-13 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-3266:
--
Affects Version/s: 1.2.0

 JavaDoubleRDD doesn't contain max()
 ---

 Key: SPARK-3266
 URL: https://issues.apache.org/jira/browse/SPARK-3266
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.0.1, 1.0.2, 1.1.0, 1.2.0
Reporter: Amey Chaugule
Assignee: Josh Rosen
 Attachments: spark-repro-3266.tar.gz


 While I can compile my code, I see:
 Caused by: java.lang.NoSuchMethodError: 
 org.apache.spark.api.java.JavaDoubleRDD.max(Ljava/util/Comparator;)Ljava/lang/Double;
 When I try to execute my Spark code. Stepping into the JavaDoubleRDD class, I 
 don't notice max()
 although it is clearly listed in the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3933) Optimize decimal type in Spark SQL for those with small precision

2014-10-13 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3933:


 Summary: Optimize decimal type in Spark SQL for those with small 
precision
 Key: SPARK-3933
 URL: https://issues.apache.org/jira/browse/SPARK-3933
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Matei Zaharia
Assignee: Matei Zaharia


With fixed-precision decimals, many decimal values will fit in a Long, so we 
can use a Decimal class with a mutable Long field to represent the unscaled 
value, rather than allocating a BigDecimal. We can then do some operations 
directly on these Long fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3266) JavaDoubleRDD doesn't contain max()

2014-10-13 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-3266:
--
Target Version/s: 1.1.1, 1.2.0

 JavaDoubleRDD doesn't contain max()
 ---

 Key: SPARK-3266
 URL: https://issues.apache.org/jira/browse/SPARK-3266
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.0.1, 1.0.2, 1.1.0, 1.2.0
Reporter: Amey Chaugule
Assignee: Josh Rosen
 Attachments: spark-repro-3266.tar.gz


 While I can compile my code, I see:
 Caused by: java.lang.NoSuchMethodError: 
 org.apache.spark.api.java.JavaDoubleRDD.max(Ljava/util/Comparator;)Ljava/lang/Double;
 When I try to execute my Spark code. Stepping into the JavaDoubleRDD class, I 
 don't notice max()
 although it is clearly listed in the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1306) no instructions provided for sbt assembly with Hadoop 2.2

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1306.
--
Resolution: Fixed

I think this was obviated by subsequent changes to this documentation. SBT is 
no longer the focus, but, building-spark.md now has more comprehensive 
documentation on building with YARN, including these recent versions.

 no instructions provided for sbt assembly with Hadoop 2.2
 -

 Key: SPARK-1306
 URL: https://issues.apache.org/jira/browse/SPARK-1306
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9.0
Reporter: Diana Carroll

 on the running-on-yarn.html page, in the section Building a YARN-Enabled 
 Assembly JAR, only the instructions for building for old Hadoop (2.0.5) 
 are provided.  There's a comment that The build process now also supports 
 new YARN versions (2.2.x). See below.
 However, the only mention below is a single sentence which says See Building 
 Spark with Maven for instructions on how to build Spark using the Maven 
 process.  There are no instructions for building with sbt. This is different 
 than in prior versions of the docs, in which a whole paragraph was provided.
 I'd like to see the command line to build for Hadoop 2.2 included right at 
 the top of the page. Also remove the bit about how it is now supported.   
 Hadoop 2.2 is now the norm, no longer an exception, as I see it. 
 Unfortunately I'm not sure exactly what the command should be.  I tried this, 
 but got errors:
 SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1234) clean up typos and grammar issues in Spark on YARN page

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1234.
--
Resolution: Won't Fix

Given the discussion in https://github.com/apache/spark/pull/130 , this was 
abandoned, but I also don't see the bad text on that page anymore anyhow. It 
probably got improved in another subsequent update.

 clean up typos and grammar issues in Spark on YARN page
 ---

 Key: SPARK-1234
 URL: https://issues.apache.org/jira/browse/SPARK-1234
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9.0
Reporter: Diana Carroll
Priority: Minor

 The Launch spark application with yarn-client mode section of this of this 
 page has several incomplete sentences, typos, etc.etc.  
 http://spark.incubator.apache.org/docs/latest/running-on-yarn.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1192) Around 30 parameters in Spark are used but undocumented and some are having confusing name

2014-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169649#comment-14169649
 ] 

Sean Owen commented on SPARK-1192:
--

PR is actually at https://github.com/apache/spark/pull/2312 and is misnamed. Is 
this still live though?

 Around 30 parameters in Spark are used but undocumented and some are having 
 confusing name
 --

 Key: SPARK-1192
 URL: https://issues.apache.org/jira/browse/SPARK-1192
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.0.0
Reporter: Nan Zhu
Assignee: Nan Zhu

 I grep the code in core component, I found that around 30 parameters in the 
 implementation is actually used but undocumented. By reading the source code, 
 I found that some of them are actually very useful for the user.
 I suggest to make a complete document on the parameters. 
 Also some parameters are having confusing names
 spark.shuffle.copier.threads - this parameters is to control how many threads 
 you will use when you start a Netty-based shuffle servicebut from the 
 name, we cannot get this information
 spark.shuffle.sender.port - the similar problem with the above one, when you 
 use Netty-based shuffle receiver, you will have to setup a Netty-based 
 sender...this parameter is to setup the port used by the Netty sender, but 
 the name cannot convey this information



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1192) Around 30 parameters in Spark are used but undocumented and some are having confusing name

2014-10-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169658#comment-14169658
 ] 

Apache Spark commented on SPARK-1192:
-

User 'CodingCat' has created a pull request for this issue:
https://github.com/apache/spark/pull/2312

 Around 30 parameters in Spark are used but undocumented and some are having 
 confusing name
 --

 Key: SPARK-1192
 URL: https://issues.apache.org/jira/browse/SPARK-1192
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.0.0
Reporter: Nan Zhu
Assignee: Nan Zhu

 I grep the code in core component, I found that around 30 parameters in the 
 implementation is actually used but undocumented. By reading the source code, 
 I found that some of them are actually very useful for the user.
 I suggest to make a complete document on the parameters. 
 Also some parameters are having confusing names
 spark.shuffle.copier.threads - this parameters is to control how many threads 
 you will use when you start a Netty-based shuffle servicebut from the 
 name, we cannot get this information
 spark.shuffle.sender.port - the similar problem with the above one, when you 
 use Netty-based shuffle receiver, you will have to setup a Netty-based 
 sender...this parameter is to setup the port used by the Netty sender, but 
 the name cannot convey this information



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3251) Clarify learning interfaces

2014-10-13 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169659#comment-14169659
 ] 

Joseph K. Bradley commented on SPARK-3251:
--

I agree it's hard to say.  Based on the description, I'd say it is a subset 
pertaining to classification models.  Perhaps it should be renamed as such?


  Clarify learning interfaces
 

 Key: SPARK-3251
 URL: https://issues.apache.org/jira/browse/SPARK-3251
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.1.0, 1.1.1
Reporter: Christoph Sawade

 *Make threshold mandatory*
 Currently, the output of predict for an example is either the score
 or the class. This side-effect is caused by clearThreshold. To
 clarify that behaviour three different types of predict (predictScore,
 predictClass, predictProbabilty) were introduced; the threshold is not
 longer optional.
 *Clarify classification interfaces*
 Currently, some functionality is spreaded over multiple models.
 In order to clarify the structure and simplify the implementation of
 more complex models (like multinomial logistic regression), two new
 classes are introduced:
 - BinaryClassificationModel: for all models that derives a binary 
 classification from a single weight vector. Comprises the tresholding 
 functionality to derive a prediction from a score. It basically captures 
 SVMModel and LogisticRegressionModel.
 - ProbabilitistClassificaitonModel: This trait defines the interface for 
 models that return a calibrated confidence score (aka probability).
 *Misc*
 - some renaming
 - add test for probabilistic output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1149) Bad partitioners can cause Spark to hang

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1149.
--
Resolution: Fixed

Looks like Patrick merged this into master in March. It might have been fixed 
for ... 1.0?

 Bad partitioners can cause Spark to hang
 

 Key: SPARK-1149
 URL: https://issues.apache.org/jira/browse/SPARK-1149
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Bryn Keller
Priority: Minor

 While implementing a unit test for lookup, I accidentally created a situation 
 where a partitioner returned a partition number that was outside its range. 
 It should have returned 0 or 1, but in the last case, it returned a -1. 
 Rather than reporting the problem via an exception, Spark simply hangs during 
 the unit test run.
 We should catch this bad behavior by partitioners and throw an exception.
 test(lookup with bad partitioner) {
 val pairs = sc.parallelize(Array((1,2), (3,4), (5,6), (5,7)))
 val p = new Partitioner {
   def numPartitions: Int = 2
   def getPartition(key: Any): Int = key.hashCode() % 2
 }
 val shuffled = pairs.partitionBy(p)
 assert(shuffled.partitioner === Some(p))
 assert(shuffled.lookup(1) === Seq(2))
 assert(shuffled.lookup(5) === Seq(6,7))
 assert(shuffled.lookup(-1) === Seq())
   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1192) Around 30 parameters in Spark are used but undocumented and some are having confusing name

2014-10-13 Thread Nan Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169663#comment-14169663
 ] 

Nan Zhu commented on SPARK-1192:


yes, I resubmitted https://github.com/apache/spark/pull/2312 for Matei's 
request (removed some, add some)

it's still valid

 Around 30 parameters in Spark are used but undocumented and some are having 
 confusing name
 --

 Key: SPARK-1192
 URL: https://issues.apache.org/jira/browse/SPARK-1192
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.0.0
Reporter: Nan Zhu
Assignee: Nan Zhu

 I grep the code in core component, I found that around 30 parameters in the 
 implementation is actually used but undocumented. By reading the source code, 
 I found that some of them are actually very useful for the user.
 I suggest to make a complete document on the parameters. 
 Also some parameters are having confusing names
 spark.shuffle.copier.threads - this parameters is to control how many threads 
 you will use when you start a Netty-based shuffle servicebut from the 
 name, we cannot get this information
 spark.shuffle.sender.port - the similar problem with the above one, when you 
 use Netty-based shuffle receiver, you will have to setup a Netty-based 
 sender...this parameter is to setup the port used by the Netty sender, but 
 the name cannot convey this information



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1083) Build fail

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1083.
--
Resolution: Cannot Reproduce

This looks like a git error, and is ancient at this point. I presume that since 
we have evidence that Windows builds subsequently worked, this was either a 
local problem or fixed by something else.

 Build fail
 --

 Key: SPARK-1083
 URL: https://issues.apache.org/jira/browse/SPARK-1083
 Project: Spark
  Issue Type: Bug
  Components: Build, Windows
Affects Versions: 0.7.3
Reporter: Jan Paw

 Problem with building the latest version from github.
 {code:none}[info] Loading project definition from 
 C:\Users\Jan\Documents\GitHub\incubator-s
 park\project\project
 [debug]
 [debug] Initial source changes:
 [debug] removed:Set()
 [debug] added: Set()
 [debug] modified: Set()
 [debug] Removed products: Set()
 [debug] Modified external sources: Set()
 [debug] Modified binary dependencies: Set()
 [debug] Initial directly invalidated sources: Set()
 [debug]
 [debug] Sources indirectly invalidated by:
 [debug] product: Set()
 [debug] binary dep: Set()
 [debug] external source: Set()
 [debug] All initially invalidated sources: Set()
 [debug] Copy resource mappings:
 [debug]
 java.lang.RuntimeException: Nonzero exit code (128): git clone 
 https://github.co
 m/chenkelmann/junit_xml_listener.git 
 C:\Users\Jan\.sbt\0.13\staging\5f76b43a3aca
 87b5c013\junit_xml_listener
 at scala.sys.package$.error(package.scala:27)
 at sbt.Resolvers$.run(Resolvers.scala:134)
 at sbt.Resolvers$.run(Resolvers.scala:123)
 at sbt.Resolvers$$anon$2.clone(Resolvers.scala:78)
 at 
 sbt.Resolvers$DistributedVCS$$anonfun$toResolver$1$$anonfun$apply$11$
 $anonfun$apply$5.apply$mcV$sp(Resolvers.scala:104)
 at sbt.Resolvers$.creates(Resolvers.scala:141)
 at 
 sbt.Resolvers$DistributedVCS$$anonfun$toResolver$1$$anonfun$apply$11.
 apply(Resolvers.scala:103)
 at 
 sbt.Resolvers$DistributedVCS$$anonfun$toResolver$1$$anonfun$apply$11.
 apply(Resolvers.scala:103)
 at 
 sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$3.apply(Bui
 ldLoader.scala:90)
 at 
 sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$3.apply(Bui
 ldLoader.scala:89)
 at scala.Option.map(Option.scala:145)
 at 
 sbt.BuildLoader$$anonfun$componentLoader$1.apply(BuildLoader.scala:89
 )
 at 
 sbt.BuildLoader$$anonfun$componentLoader$1.apply(BuildLoader.scala:85
 )
 at sbt.MultiHandler.apply(BuildLoader.scala:16)
 at sbt.BuildLoader.apply(BuildLoader.scala:142)
 at sbt.Load$.loadAll(Load.scala:314)
 at sbt.Load$.loadURI(Load.scala:266)
 at sbt.Load$.load(Load.scala:262)
 at sbt.Load$.load(Load.scala:253)
 at sbt.Load$.apply(Load.scala:137)
 at sbt.Load$.buildPluginDefinition(Load.scala:597)
 at sbt.Load$.buildPlugins(Load.scala:563)
 at sbt.Load$.plugins(Load.scala:551)
 at sbt.Load$.loadUnit(Load.scala:412)
 at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:258)
 at sbt.Load$$anonfun$15$$anonfun$apply$11.apply(Load.scala:258)
 at 
 sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$
 apply$5$$anonfun$apply$6.apply(BuildLoader.scala:93)
 at 
 sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$
 apply$5$$anonfun$apply$6.apply(BuildLoader.scala:92)
 at sbt.BuildLoader.apply(BuildLoader.scala:143)
 at sbt.Load$.loadAll(Load.scala:314)
 at sbt.Load$.loadURI(Load.scala:266)
 at sbt.Load$.load(Load.scala:262)
 at sbt.Load$.load(Load.scala:253)
 at sbt.Load$.apply(Load.scala:137)
 at sbt.Load$.defaultLoad(Load.scala:40)
 at sbt.BuiltinCommands$.doLoadProject(Main.scala:451)
 at 
 sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:445)
 at 
 sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:445)
 at 
 sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.sca
 la:60)
 at 
 sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.sca
 la:60)
 at 
 sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.sca
 la:62)
 at 
 sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.sca
 la:62)
 at sbt.Command$.process(Command.scala:95)
 at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100)
 at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:100)
 at sbt.State$$anon$1.process(State.scala:179)
 at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100)
 at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:100)
 at 

[jira] [Resolved] (SPARK-1017) Set the permgen even if we are calling the users sbt (via SBT_OPTS)

2014-10-13 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1017.
--
Resolution: Won't Fix

As I understand, only {{sbt/sbt}} is supported for building Spark with SBT, 
rather than a local {{sbt}}. Maven is the primary build, and it sets 
{{MaxPermSize}} and {{PermGen}} for scalac and scalatest. I think this is 
obsolete and/or already covered then?

 Set the permgen even if we are calling the users sbt (via SBT_OPTS)
 ---

 Key: SPARK-1017
 URL: https://issues.apache.org/jira/browse/SPARK-1017
 Project: Spark
  Issue Type: Improvement
Reporter: Patrick Cogan
Assignee: Patrick Cogan

 Now we will call the users sbt installation if they have one. But users might 
 run into the permgen issues... so we should force the permgen unless the user 
 explicitly overrides it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3923) All Standalone Mode services time out with each other

2014-10-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169692#comment-14169692
 ] 

Apache Spark commented on SPARK-3923:
-

User 'aarondav' has created a pull request for this issue:
https://github.com/apache/spark/pull/2784

 All Standalone Mode services time out with each other
 -

 Key: SPARK-3923
 URL: https://issues.apache.org/jira/browse/SPARK-3923
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.2.0
Reporter: Aaron Davidson
Priority: Blocker

 I'm seeing an issue where it seems that components in Standalone Mode 
 (Worker, Master, Driver, and Executor) all seem to time out with each other 
 after around 1000 seconds. Here is an example log:
 {code}
 14/10/13 06:43:55 INFO Master: Registering worker 
 ip-10-0-147-189.us-west-2.compute.internal:38922 with 4 cores, 29.0 GB RAM
 14/10/13 06:43:55 INFO Master: Registering worker 
 ip-10-0-175-214.us-west-2.compute.internal:42918 with 4 cores, 59.0 GB RAM
 14/10/13 06:43:56 INFO Master: Registering app Databricks Shell
 14/10/13 06:43:56 INFO Master: Registered app Databricks Shell with ID 
 app-20141013064356-
 ... precisely 1000 seconds later ...
 14/10/13 07:00:35 WARN ReliableDeliverySupervisor: Association with remote 
 system 
 [akka.tcp://sparkwor...@ip-10-0-147-189.us-west-2.compute.internal:38922] has 
 failed, address is now gated for [5000] ms. Reason is: [Disassociated].
 14/10/13 07:00:35 INFO Master: 
 akka.tcp://sparkwor...@ip-10-0-147-189.us-west-2.compute.internal:38922 got 
 disassociated, removing it.
 14/10/13 07:00:35 INFO LocalActorRef: Message 
 [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from 
 Actor[akka://sparkMaster/deadLetters] to 
 Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.0.147.189%3A54956-1#1529980245]
  was not delivered. [2] dead letters encountered. This logging can be turned 
 off or adjusted with configuration settings 'akka.log-dead-letters' and 
 'akka.log-dead-letters-during-shutdown'.
 14/10/13 07:00:35 INFO Master: 
 akka.tcp://sparkwor...@ip-10-0-175-214.us-west-2.compute.internal:42918 got 
 disassociated, removing it.
 14/10/13 07:00:35 INFO Master: Removing worker 
 worker-20141013064354-ip-10-0-175-214.us-west-2.compute.internal-42918 on 
 ip-10-0-175-214.us-west-2.compute.internal:42918
 14/10/13 07:00:35 INFO Master: Telling app of lost executor: 1
 14/10/13 07:00:35 INFO Master: 
 akka.tcp://sparkwor...@ip-10-0-175-214.us-west-2.compute.internal:42918 got 
 disassociated, removing it.
 14/10/13 07:00:35 WARN ReliableDeliverySupervisor: Association with remote 
 system 
 [akka.tcp://sparkwor...@ip-10-0-175-214.us-west-2.compute.internal:42918] has 
 failed, address is now gated for [5000] ms. Reason is: [Disassociated].
 14/10/13 07:00:35 INFO LocalActorRef: Message 
 [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from 
 Actor[akka://sparkMaster/deadLetters] to 
 Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.0.175.214%3A35958-2#314633324]
  was not delivered. [3] dead letters encountered. This logging can be turned 
 off or adjusted with configuration settings 'akka.log-dead-letters' and 
 'akka.log-dead-letters-during-shutdown'.
 14/10/13 07:00:35 INFO LocalActorRef: Message 
 [akka.remote.transport.AssociationHandle$Disassociated] from 
 Actor[akka://sparkMaster/deadLetters] to 
 Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.0.175.214%3A35958-2#314633324]
  was not delivered. [4] dead letters encountered. This logging can be turned 
 off or adjusted with configuration settings 'akka.log-dead-letters' and 
 'akka.log-dead-letters-during-shutdown'.
 14/10/13 07:00:36 INFO ProtocolStateActor: No response from remote. Handshake 
 timed out or transport failure detector triggered.
 14/10/13 07:00:36 INFO Master: 
 akka.tcp://sparkdri...@ip-10-0-175-215.us-west-2.compute.internal:58259 got 
 disassociated, removing it.
 14/10/13 07:00:36 INFO LocalActorRef: Message 
 [akka.remote.transport.AssociationHandle$InboundPayload] from 
 Actor[akka://sparkMaster/deadLetters] to 
 Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.0.175.215%3A41987-3#1944377249]
  was not delivered. [5] dead letters encountered. This logging can be turned 
 off or adjusted with configuration settings 'akka.log-dead-letters' and 
 'akka.log-dead-letters-during-shutdown'.
 14/10/13 07:00:36 INFO Master: Removing app app-20141013064356-
 14/10/13 07:00:36 WARN ReliableDeliverySupervisor: Association with remote 
 system 
 

[jira] [Resolved] (SPARK-1409) Flaky Test: actor input stream test in org.apache.spark.streaming.InputStreamsSuite

2014-10-13 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-1409.
-
Resolution: Won't Fix

 Flaky Test: actor input stream test in 
 org.apache.spark.streaming.InputStreamsSuite
 -

 Key: SPARK-1409
 URL: https://issues.apache.org/jira/browse/SPARK-1409
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Michael Armbrust
Assignee: Tathagata Das

 Here are just a few cases:
 https://travis-ci.org/apache/spark/jobs/22151827
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13709/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1463) cleanup unnecessary dependency jars in the spark assembly jars

2014-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169709#comment-14169709
 ] 

Sean Owen commented on SPARK-1463:
--

FWIW I do not see these packages in the final assembly JAR anymore. This may be 
obsolete?

 cleanup unnecessary dependency jars in the spark assembly jars
 --

 Key: SPARK-1463
 URL: https://issues.apache.org/jira/browse/SPARK-1463
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 0.9.0
Reporter: Jenny MA
Priority: Minor
  Labels: easyfix
 Fix For: 1.0.0


 there are couple GPL/LGPL based dependencies which are included in the final 
 assembly jar, which are not used by spark runtime.  identified the following 
 libraries. we can provide a fix in assembly/pom.xml. 
 excludecom.google.code.findbugs:*/exclude
 excludeorg.acplt:oncrpc:*/exclude
 excludeglassfish:*/exclude
 excludecom.cenqua.clover:clover:*/exclude
 excludeorg.glassfish:*/exclude
 excludeorg.glassfish.grizzly:*/exclude
 excludeorg.glassfish.gmbal:*/exclude 
 excludeorg.glassfish.external:*/exclude
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1010) Update all unit tests to use SparkConf instead of system properties

2014-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169724#comment-14169724
 ] 

Sean Owen commented on SPARK-1010:
--

Yes, lots of usage in tests still. A lot looks intentional.

{code}
find . -name *Suite.scala -type f -exec grep -E System\.[gs]etProperty {} \;
...
.format(System.getProperty(user.name, unknown),
.format(System.getProperty(user.name, unknown)).stripMargin
System.setProperty(spark.testing, true)
System.setProperty(spark.reducer.maxMbInFlight, 1)
System.setProperty(spark.storage.memoryFraction, 0.0001)
System.setProperty(spark.storage.memoryFraction, 0.01)
System.setProperty(spark.authenticate, false)
System.setProperty(spark.authenticate, false)
System.setProperty(spark.shuffle.manager, hash)
System.setProperty(spark.scheduler.mode, FIFO)
System.setProperty(spark.scheduler.mode, FAIR)
...
{code}


 Update all unit tests to use SparkConf instead of system properties
 ---

 Key: SPARK-1010
 URL: https://issues.apache.org/jira/browse/SPARK-1010
 Project: Spark
  Issue Type: New Feature
Affects Versions: 0.9.0
Reporter: Patrick Cogan
Assignee: Nirmal
Priority: Minor
  Labels: starter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3934) RandomForest bug in sanity check in DTStatsAggregator

2014-10-13 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-3934:


 Summary: RandomForest bug in sanity check in DTStatsAggregator
 Key: SPARK-3934
 URL: https://issues.apache.org/jira/browse/SPARK-3934
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Reporter: Joseph K. Bradley


When run with a mix of unordered categorical and continuous features, on 
multiclass classification, RandomForest fails.  The bug is in the sanity checks 
in getFeatureOffset and getLeftRightFeatureOffsets, which use the wrong indices 
for checking whether features are unordered.

Proposal: Remove the sanity checks since they are not really needed, and since 
they would require DTStatsAggregator to keep track of an extra set of indices 
(for the feature subset).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3934) RandomForest bug in sanity check in DTStatsAggregator

2014-10-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169752#comment-14169752
 ] 

Apache Spark commented on SPARK-3934:
-

User 'jkbradley' has created a pull request for this issue:
https://github.com/apache/spark/pull/2785

 RandomForest bug in sanity check in DTStatsAggregator
 -

 Key: SPARK-3934
 URL: https://issues.apache.org/jira/browse/SPARK-3934
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Reporter: Joseph K. Bradley
Assignee: Joseph K. Bradley

 When run with a mix of unordered categorical and continuous features, on 
 multiclass classification, RandomForest fails.  The bug is in the sanity 
 checks in getFeatureOffset and getLeftRightFeatureOffsets, which use the 
 wrong indices for checking whether features are unordered.
 Proposal: Remove the sanity checks since they are not really needed, and 
 since they would require DTStatsAggregator to keep track of an extra set of 
 indices (for the feature subset).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3918) Forget Unpersist in RandomForest.scala(train Method)

2014-10-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169753#comment-14169753
 ] 

Apache Spark commented on SPARK-3918:
-

User 'jkbradley' has created a pull request for this issue:
https://github.com/apache/spark/pull/2785

 Forget Unpersist in RandomForest.scala(train Method)
 

 Key: SPARK-3918
 URL: https://issues.apache.org/jira/browse/SPARK-3918
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.2.0
 Environment: All
Reporter: junlong
Assignee: Joseph K. Bradley
  Labels: decisiontree, train, unpersist
 Fix For: 1.1.0

   Original Estimate: 10m
  Remaining Estimate: 10m

In version 1.1.0 DecisionTree.scala, train Method, treeInput has been 
 persisted in Memory, but without unpersist. It caused heavy DISK usage.
In github version(1.2.0 maybe), RandomForest.scala, train Method, 
 baggedInput has been persisted but without unpersisted too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3913) Spark Yarn Client API change to expose Yarn Resource Capacity, Yarn Application Listener and killApplication() API

2014-10-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169760#comment-14169760
 ] 

Apache Spark commented on SPARK-3913:
-

User 'chesterxgchen' has created a pull request for this issue:
https://github.com/apache/spark/pull/2786

 Spark Yarn Client API change to expose Yarn Resource Capacity, Yarn 
 Application Listener and killApplication() API
 --

 Key: SPARK-3913
 URL: https://issues.apache.org/jira/browse/SPARK-3913
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Reporter: Chester

 When working with Spark with Yarn deployment mode, we have two issues:
 1) We don't know how much yarn max capacity ( memory and cores) before we 
 specify the number of executor and memories for spark drivers and executors. 
 We we set a big number, the job can potentially exceeds the limit and got 
 killed. 
It would be better we let the application know that the yarn resource 
 capacity a head of time and the spark config can adjusted dynamically. 
   
 2) Once job started, we would like to have some feedbacks from yarn 
 application. Currently, the spark client basically block the call and returns 
 when the job is finished or failed or killed. 
 If the job runs for few hours, we have no idea how far it has gone, the 
 progress and resource usage, tracking URL etc. 
 3) Once the job is started, you basically can't stop it. The Yarn Client API 
 stop doesn't to work in most cases from our experience.  But Yarn API does 
 work is killApplication(appId). 
So we need to expose this killApplication() API to Spark Yarn Client as 
 well. 

 I will create one Pull Request and try to address these problems.  
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3654) Implement all extended HiveQL statements/commands with a separate parser combinator

2014-10-13 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-3654:

Assignee: Ravindra Pesala

 Implement all extended HiveQL statements/commands with a separate parser 
 combinator
 ---

 Key: SPARK-3654
 URL: https://issues.apache.org/jira/browse/SPARK-3654
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Cheng Lian
Assignee: Ravindra Pesala
 Fix For: 1.2.0


 Statements and commands like {{SET}}, {{CACHE TABLE}} and {{ADD JAR}} etc. 
 are currently parsed in a quite hacky way, like this:
 {code}
 if (sql.trim.toLowerCase.startsWith(cache table)) {
   sql.trim.toLowerCase.startsWith(cache table) match {
 ...
   }
 }
 {code}
 It would be much better to add an extra parser combinator that parses these 
 syntax extensions first, and then fallback to the normal Hive parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3813) Support case when conditional functions in Spark SQL

2014-10-13 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-3813:

Assignee: Ravindra Pesala

 Support case when conditional functions in Spark SQL
 --

 Key: SPARK-3813
 URL: https://issues.apache.org/jira/browse/SPARK-3813
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.0
Reporter: Ravindra Pesala(Old.Don't assign to it)
Assignee: Ravindra Pesala
 Fix For: 1.2.0


 The SQL queries which has following conditional functions are not supported 
 in Spark SQL.
 {code}
 CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END
 CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END
 {code}
 The same functions can work in Spark HiveQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2594) Add CACHE TABLE name AS SELECT ...

2014-10-13 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-2594:

Assignee: Ravindra Pesala

 Add CACHE TABLE name AS SELECT ...
 

 Key: SPARK-2594
 URL: https://issues.apache.org/jira/browse/SPARK-2594
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Michael Armbrust
Assignee: Ravindra Pesala
Priority: Critical
 Fix For: 1.2.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3371) Spark SQL: Renaming a function expression with group by gives error

2014-10-13 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-3371:

Assignee: Ravindra Pesala

 Spark SQL: Renaming a function expression with group by gives error
 ---

 Key: SPARK-3371
 URL: https://issues.apache.org/jira/browse/SPARK-3371
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Pei-Lun Lee
Assignee: Ravindra Pesala
 Fix For: 1.2.0


 {code}
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
 val rdd = sc.parallelize(List({foo:bar}))
 sqlContext.jsonRDD(rdd).registerAsTable(t1)
 sqlContext.registerFunction(len, (s: String) = s.length)
 sqlContext.sql(select len(foo) as a, count(1) from t1 group by 
 len(foo)).collect()
 {code}
 running above code in spark-shell gives the following error
 {noformat}
 14/09/03 17:20:13 ERROR Executor: Exception in task 2.0 in stage 3.0 (TID 214)
 org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
 attribute, tree: foo#0
   at 
 org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
   at 
 org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:43)
   at 
 org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:42)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4$$anonfun$apply$2.apply(TreeNode.scala:201)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at scala.collection.immutable.List.foreach(List.scala:318)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:199)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at 
 scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
   at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
   at 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
   at scala.collection.AbstractIterator.to(Iterator.scala:1157)
   at 
 scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
   at 
 scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
   at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
   at 
 org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 {noformat}
 remove as a in the query causes no error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >