[jira] [Commented] (SPARK-6824) Fill the docs for DataFrame API in SparkR

2015-05-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532139#comment-14532139
 ] 

Apache Spark commented on SPARK-6824:
-

User 'hqzizania' has created a pull request for this issue:
https://github.com/apache/spark/pull/5969

> Fill the docs for DataFrame API in SparkR
> -
>
> Key: SPARK-6824
> URL: https://issues.apache.org/jira/browse/SPARK-6824
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Qian Huang
>Priority: Blocker
>
> Some of the DataFrame functions in SparkR do not have complete roxygen docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-6824) Fill the docs for DataFrame API in SparkR

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6824:
---

Assignee: Qian Huang  (was: Apache Spark)

> Fill the docs for DataFrame API in SparkR
> -
>
> Key: SPARK-6824
> URL: https://issues.apache.org/jira/browse/SPARK-6824
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Qian Huang
>Priority: Blocker
>
> Some of the DataFrame functions in SparkR do not have complete roxygen docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-6824) Fill the docs for DataFrame API in SparkR

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6824:
---

Assignee: Apache Spark  (was: Qian Huang)

> Fill the docs for DataFrame API in SparkR
> -
>
> Key: SPARK-6824
> URL: https://issues.apache.org/jira/browse/SPARK-6824
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Apache Spark
>Priority: Blocker
>
> Some of the DataFrame functions in SparkR do not have complete roxygen docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7436) Cannot implement nor use custom StandaloneRecoveryModeFactory implementations

2015-05-06 Thread Jacek Lewandowski (JIRA)
Jacek Lewandowski created SPARK-7436:


 Summary: Cannot implement nor use custom 
StandaloneRecoveryModeFactory implementations
 Key: SPARK-7436
 URL: https://issues.apache.org/jira/browse/SPARK-7436
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Jacek Lewandowski


At least, this code fragment is buggy ({{Master.scala}}):

{code}
  case "CUSTOM" =>
val clazz = Class.forName(conf.get("spark.deploy.recoveryMode.factory"))
val factory = clazz.getConstructor(conf.getClass, 
Serialization.getClass)
  .newInstance(conf, SerializationExtension(context.system))
  .asInstanceOf[StandaloneRecoveryModeFactory]
(factory.createPersistenceEngine(), 
factory.createLeaderElectionAgent(this))
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7437) Fold "literal in (item1, item2, ..., literal, ...)" into false directly if not in.

2015-05-06 Thread Zhongshuai Pei (JIRA)
Zhongshuai Pei created SPARK-7437:
-

 Summary: Fold "literal in (item1, item2, ..., literal, ...)" into 
false directly if not in.
 Key: SPARK-7437
 URL: https://issues.apache.org/jira/browse/SPARK-7437
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Zhongshuai Pei






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7431) PySpark CrossValidatorModel needs to call parent init

2015-05-06 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley reassigned SPARK-7431:


Assignee: Joseph K. Bradley

> PySpark CrossValidatorModel needs to call parent init
> -
>
> Key: SPARK-7431
> URL: https://issues.apache.org/jira/browse/SPARK-7431
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>
> Try running the CrossValidator doc test in the pyspark shell.  Then type 
> cvModel to print the model.  It will fail in {{Identifiable.__repr__}} since 
> there is no uid defined!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7431) PySpark CrossValidatorModel needs to call parent init

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7431:
---

Assignee: Apache Spark

> PySpark CrossValidatorModel needs to call parent init
> -
>
> Key: SPARK-7431
> URL: https://issues.apache.org/jira/browse/SPARK-7431
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>
> Try running the CrossValidator doc test in the pyspark shell.  Then type 
> cvModel to print the model.  It will fail in {{Identifiable.__repr__}} since 
> there is no uid defined!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7431) PySpark CrossValidatorModel needs to call parent init

2015-05-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532114#comment-14532114
 ] 

Apache Spark commented on SPARK-7431:
-

User 'jkbradley' has created a pull request for this issue:
https://github.com/apache/spark/pull/5968

> PySpark CrossValidatorModel needs to call parent init
> -
>
> Key: SPARK-7431
> URL: https://issues.apache.org/jira/browse/SPARK-7431
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>
> Try running the CrossValidator doc test in the pyspark shell.  Then type 
> cvModel to print the model.  It will fail in {{Identifiable.__repr__}} since 
> there is no uid defined!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7431) PySpark CrossValidatorModel needs to call parent init

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7431:
---

Assignee: (was: Apache Spark)

> PySpark CrossValidatorModel needs to call parent init
> -
>
> Key: SPARK-7431
> URL: https://issues.apache.org/jira/browse/SPARK-7431
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>
> Try running the CrossValidator doc test in the pyspark shell.  Then type 
> cvModel to print the model.  It will fail in {{Identifiable.__repr__}} since 
> there is no uid defined!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7431) PySpark CrossValidatorModel needs to call parent init

2015-05-06 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-7431:
-
Priority: Major  (was: Critical)

> PySpark CrossValidatorModel needs to call parent init
> -
>
> Key: SPARK-7431
> URL: https://issues.apache.org/jira/browse/SPARK-7431
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>
> Try running the CrossValidator doc test in the pyspark shell.  Then type 
> cvModel to print the model.  It will fail in {{Identifiable.__repr__}} since 
> there is no uid defined!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7431) PySpark CrossValidatorModel needs to call parent init

2015-05-06 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-7431:
-
Summary: PySpark CrossValidatorModel needs to call parent init  (was: 
cvModel does not have uid in Python doc test)

> PySpark CrossValidatorModel needs to call parent init
> -
>
> Key: SPARK-7431
> URL: https://issues.apache.org/jira/browse/SPARK-7431
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>Priority: Critical
>
> Try running the CrossValidator doc test in the pyspark shell.  Then type 
> cvModel to print the model.  It will fail in {{Identifiable.__repr__}} since 
> there is no uid defined!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7183) Memory leak in netty shuffle with spark standalone cluster

2015-05-06 Thread Jack Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532108#comment-14532108
 ] 

Jack Hu commented on SPARK-7183:


Hi, [~sowen]

Do we plan to add this to 1.3+? If there is any plan to release more minor 
release for 1.3+ like 1.3.2. 


> Memory leak in netty shuffle with spark standalone cluster
> --
>
> Key: SPARK-7183
> URL: https://issues.apache.org/jira/browse/SPARK-7183
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 1.3.0
>Reporter: Jack Hu
>Assignee: Liang-Chi Hsieh
>  Labels: memory-leak, netty, shuffle
> Fix For: 1.4.0
>
>
> There is slow leak in netty shuffle with spark cluster in 
> {{TransportRequestHandler.streamIds}}
> In spark cluster, there are some reusable netty connections between two block 
> managers to get/send blocks between worker/drivers. These connections are 
> handled by the {{org.apache.spark.network.server.TransportRequestHandler}} in 
> server side. This handler keep tracking all the streamids negotiate by RPC 
> when shuffle data need transform in these two block managers and the streamid 
> is keeping increasing, and never get a chance to be deleted exception this 
> connection is dropped (seems never happen in normal running).
> Here are some detail logs of this  {{TransportRequestHandler}} (Note: we add 
> a log a print the total size of {{TransportRequestHandler.streamIds}}, the 
> log is "Current set size is N of 
> org.apache.spark.network.server.TransportRequestHandler@ADDRESS", this set 
> size is keeping increasing in our test)
> {quote}
> 15/04/22 21:00:16 DEBUG TransportServer: Shuffle server started on port :46288
> 15/04/22 21:00:16 INFO NettyBlockTransferService: Server created on 46288
> 15/04/22 21:00:31 INFO TransportRequestHandler: Created 
> TransportRequestHandler 
> org.apache.spark.network.server.TransportRequestHandler@29a4f3e7
> 15/04/22 21:00:32 TRACE MessageDecoder: Received message RpcRequest: 
> RpcRequest\{requestId=6655045571437304938, message=[B@59778678\}
> 15/04/22 21:00:32 TRACE NettyBlockRpcServer: Received request: 
> OpenBlocks\{appId=app-20150422210016-, execId=, 
> blockIds=[broadcast_1_piece0]}
> 15/04/22 21:00:32 TRACE NettyBlockRpcServer: Registered streamId 
> 1387459488000 with 1 buffers
> 15/04/22 21:00:33 TRACE TransportRequestHandler: Sent result 
> RpcResponse\{requestId=6655045571437304938, response=[B@d2840b\} to client 
> /10.111.7.150:33802
> 15/04/22 21:00:33 TRACE MessageDecoder: Received message ChunkFetchRequest: 
> ChunkFetchRequest\{streamChunkId=StreamChunkId\{streamId=1387459488000, 
> chunkIndex=0}}
> 15/04/22 21:00:33 TRACE TransportRequestHandler: Received req from 
> /10.111.7.150:33802 to fetch block StreamChunkId\{streamId=1387459488000, 
> chunkIndex=0\}
> 15/04/22 21:00:33 INFO TransportRequestHandler: Current set size is 1 of 
> org.apache.spark.network.server.TransportRequestHandler@29a4f3e7
> 15/04/22 21:00:33 TRACE OneForOneStreamManager: Removing stream id 
> 1387459488000
> 15/04/22 21:00:33 TRACE TransportRequestHandler: Sent result 
> ChunkFetchSuccess\{streamChunkId=StreamChunkId\{streamId=1387459488000, 
> chunkIndex=0}, buffer=NioManagedBuffer\{buf=java.nio.HeapByteBuffer[pos=0 
> lim=3839 cap=3839]}} to client /10.111.7.150:33802
> 15/04/22 21:00:34 TRACE MessageDecoder: Received message RpcRequest: 
> RpcRequest\{requestId=6660601528868866371, message=[B@42bed1b8\}
> 15/04/22 21:00:34 TRACE NettyBlockRpcServer: Received request: 
> OpenBlocks\{appId=app-20150422210016-, execId=, 
> blockIds=[broadcast_3_piece0]}
> 15/04/22 21:00:34 TRACE NettyBlockRpcServer: Registered streamId 
> 1387459488001 with 1 buffers
> 15/04/22 21:00:34 TRACE TransportRequestHandler: Sent result 
> RpcResponse\{requestId=6660601528868866371, response=[B@7fa3fb60\} to client 
> /10.111.7.150:33802
> 15/04/22 21:00:34 TRACE MessageDecoder: Received message ChunkFetchRequest: 
> ChunkFetchRequest\{streamChunkId=StreamChunkId\{streamId=1387459488001, 
> chunkIndex=0}}
> 15/04/22 21:00:34 TRACE TransportRequestHandler: Received req from 
> /10.111.7.150:33802 to fetch block StreamChunkId\{streamId=1387459488001, 
> chunkIndex=0\}
> 15/04/22 21:00:34 INFO TransportRequestHandler: Current set size is 2 of 
> org.apache.spark.network.server.TransportRequestHandler@29a4f3e7
> 15/04/22 21:00:34 TRACE OneForOneStreamManager: Removing stream id 
> 1387459488001
> 15/04/22 21:00:34 TRACE TransportRequestHandler: Sent result 
> ChunkFetchSuccess\{streamChunkId=StreamChunkId\{streamId=1387459488001, 
> chunkIndex=0}, buffer=NioManagedBuffer\{buf=java.nio.HeapByteBuffer[pos=0 
> lim=4277 cap=4277]}} to client /10.111.7.150:33802
> 15/04/22 21:00:34 TRACE MessageDecoder: Received message RpcRequest: 
> RpcReq

[jira] [Commented] (SPARK-7230) Make RDD API private in SparkR for Spark 1.4

2015-05-06 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532106#comment-14532106
 ] 

Reynold Xin commented on SPARK-7230:


We should hide them for now.  As a matter of fact, I think those shouldn't even 
exist in the Scala/Python version of DataFrames, but those are hard to remove 
now.


> Make RDD API private in SparkR for Spark 1.4
> 
>
> Key: SPARK-7230
> URL: https://issues.apache.org/jira/browse/SPARK-7230
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.4.0
>Reporter: Shivaram Venkataraman
>Assignee: Shivaram Venkataraman
>Priority: Critical
> Fix For: 1.4.0
>
>
> This ticket proposes making the RDD API in SparkR private for the 1.4 
> release. The motivation for doing so are discussed in a larger design 
> document aimed at a more top-down design of the SparkR APIs. A first cut that 
> discusses motivation and proposed changes can be found at http://goo.gl/GLHKZI
> The main points in that document that relate to this ticket are:
> - The RDD API requires knowledge of the distributed system and is pretty low 
> level. This is not very suitable for a number of R users who are used to more 
> high-level packages that work out of the box.
> - The RDD implementation in SparkR is not fully robust right now: we are 
> missing features like spilling for aggregation, handling partitions which 
> don't fit in memory etc. There are further limitations like lack of hashCode 
> for non-native types etc. which might affect user experience.
> The only change we will make for now is to not export the RDD functions as 
> public methods in the SparkR package and I will create another ticket for 
> discussing more details public API for 1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7230) Make RDD API private in SparkR for Spark 1.4

2015-05-06 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532101#comment-14532101
 ] 

Sun Rui commented on SPARK-7230:


One question here is there are still some basic RDD API methods provided in 
DataFrame, like map()/flatMap()/MapPartitions() and foreach(). What's our 
policy on these methods()? Will we also make them private for 1.4  or we will 
support them for long term?


> Make RDD API private in SparkR for Spark 1.4
> 
>
> Key: SPARK-7230
> URL: https://issues.apache.org/jira/browse/SPARK-7230
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.4.0
>Reporter: Shivaram Venkataraman
>Assignee: Shivaram Venkataraman
>Priority: Critical
> Fix For: 1.4.0
>
>
> This ticket proposes making the RDD API in SparkR private for the 1.4 
> release. The motivation for doing so are discussed in a larger design 
> document aimed at a more top-down design of the SparkR APIs. A first cut that 
> discusses motivation and proposed changes can be found at http://goo.gl/GLHKZI
> The main points in that document that relate to this ticket are:
> - The RDD API requires knowledge of the distributed system and is pretty low 
> level. This is not very suitable for a number of R users who are used to more 
> high-level packages that work out of the box.
> - The RDD implementation in SparkR is not fully robust right now: we are 
> missing features like spilling for aggregation, handling partitions which 
> don't fit in memory etc. There are further limitations like lack of hashCode 
> for non-native types etc. which might affect user experience.
> The only change we will make for now is to not export the RDD functions as 
> public methods in the SparkR package and I will create another ticket for 
> discussing more details public API for 1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7262) Binary LogisticRegression with L1/L2 (elastic net) using OWLQN in new ML package

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7262:
---

Assignee: (was: Apache Spark)

> Binary LogisticRegression with L1/L2 (elastic net) using OWLQN in new ML 
> package
> 
>
> Key: SPARK-7262
> URL: https://issues.apache.org/jira/browse/SPARK-7262
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: DB Tsai
>
> 1) Handle scaling and addBias internally. 
> 2) L1/L2 elasticnet using OWLQN optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7262) Binary LogisticRegression with L1/L2 (elastic net) using OWLQN in new ML package

2015-05-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532088#comment-14532088
 ] 

Apache Spark commented on SPARK-7262:
-

User 'dbtsai' has created a pull request for this issue:
https://github.com/apache/spark/pull/5967

> Binary LogisticRegression with L1/L2 (elastic net) using OWLQN in new ML 
> package
> 
>
> Key: SPARK-7262
> URL: https://issues.apache.org/jira/browse/SPARK-7262
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: DB Tsai
>
> 1) Handle scaling and addBias internally. 
> 2) L1/L2 elasticnet using OWLQN optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7262) Binary LogisticRegression with L1/L2 (elastic net) using OWLQN in new ML package

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7262:
---

Assignee: Apache Spark

> Binary LogisticRegression with L1/L2 (elastic net) using OWLQN in new ML 
> package
> 
>
> Key: SPARK-7262
> URL: https://issues.apache.org/jira/browse/SPARK-7262
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: DB Tsai
>Assignee: Apache Spark
>
> 1) Handle scaling and addBias internally. 
> 2) L1/L2 elasticnet using OWLQN optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7435) Make DataFrame.show() cosistent with that of Scala and pySpark

2015-05-06 Thread Sun Rui (JIRA)
Sun Rui created SPARK-7435:
--

 Summary: Make DataFrame.show() cosistent with that of Scala and 
pySpark
 Key: SPARK-7435
 URL: https://issues.apache.org/jira/browse/SPARK-7435
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Sun Rui
Priority: Blocker


Currently in SparkR, DataFrame has two methods show() and showDF(). show() 
prints the DataFrame column names and types and showDF() prints the first 
numRows rows of a DataFrame.

In Scala and pySpark, show() is used to prints rows of a DataFrame. 

We'd better keep API consistent unless there is some important reason. So 
propose to interchange the names (show() and showDF()) in SparkR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5938) Generate row from json efficiently

2015-05-06 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-5938.
-
   Resolution: Fixed
Fix Version/s: 1.4.0

It has been resolved by 
https://github.com/apache/spark/commit/2d6612cc8b98f767d73c4d15e4065bf3d6c12ea7.

> Generate row from json efficiently
> --
>
> Key: SPARK-5938
> URL: https://issues.apache.org/jira/browse/SPARK-5938
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>Assignee: Nathan Howell
>Priority: Minor
> Fix For: 1.4.0
>
>
> Generate row from json efficiently in JsonRDD object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5443) jsonRDD with schema should ignore sub-objects that are omitted in schema

2015-05-06 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-5443.
-
   Resolution: Fixed
Fix Version/s: 1.4.0

It has been resolved by 
https://github.com/apache/spark/commit/2d6612cc8b98f767d73c4d15e4065bf3d6c12ea7.

> jsonRDD with schema should ignore sub-objects that are omitted in schema
> 
>
> Key: SPARK-5443
> URL: https://issues.apache.org/jira/browse/SPARK-5443
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Derrick Burns
>Assignee: Nathan Howell
> Fix For: 1.4.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Reading the code for jsonRDD, it appears that all fields of a JSON object are 
> read into a ROW independent of the provided schema. I would expect it to be 
> more efficient to only store in the ROW those fields that are explicitly 
> included in the schema. 
> For example, assume that I only wish to extract the "id" field of a tweet.  
> If I provided a schema that simply had one field within a map named "id", 
> then the row object would only store that field within a map.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6812) filter() on DataFrame does not work as expected

2015-05-06 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532070#comment-14532070
 ] 

Sun Rui commented on SPARK-6812:


[~shivaram], Yes I agree. Seems there are still two methods, sampleDF() and 
saveDF(), we can change them back to sample() and save()?


> filter() on DataFrame does not work as expected
> ---
>
> Key: SPARK-6812
> URL: https://issues.apache.org/jira/browse/SPARK-6812
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Davies Liu
>Assignee: Sun Rui
>Priority: Blocker
> Fix For: 1.4.0
>
>
> {code}
> > filter(df, df$age > 21)
> Error in filter(df, df$age > 21) :
>   no method for coercing this S4 class to a vector
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7434) DSL for Pipeline assembly

2015-05-06 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-7434:


 Summary: DSL for Pipeline assembly
 Key: SPARK-7434
 URL: https://issues.apache.org/jira/browse/SPARK-7434
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Joseph K. Bradley


This will require a design doc to figure out the DSL and figure out how to 
avoid conflicts in parameters for input and output columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5874) How to improve the current ML pipeline API?

2015-05-06 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532067#comment-14532067
 ] 

Joseph K. Bradley commented on SPARK-5874:
--

[~eronwright]  I think that's been mentioned somewhere (a design doc), but I 
agree this will be *very* helpful.  I'll add a JIRA for it.

> How to improve the current ML pipeline API?
> ---
>
> Key: SPARK-5874
> URL: https://issues.apache.org/jira/browse/SPARK-5874
> Project: Spark
>  Issue Type: Brainstorming
>  Components: ML
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Critical
>
> I created this JIRA to collect feedbacks about the ML pipeline API we 
> introduced in Spark 1.2. The target is to graduate this set of APIs in 1.4 
> with confidence, which requires valuable input from the community. I'll 
> create sub-tasks for each major issue.
> Design doc (WIP): 
> https://docs.google.com/a/databricks.com/document/d/1plFBPJY_PriPTuMiFYLSm7fQgD1FieP4wt3oMVKMGcc/edit#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7262) Binary LogisticRegression with L1/L2 (elastic net) using OWLQN in new ML package

2015-05-06 Thread DB Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai updated SPARK-7262:
---
Description: 
1) Handle scaling and addBias internally. 
2) L1/L2 elasticnet using OWLQN optimizer.

  was:
1) Handle scaling and addBias internally. 
2) L1/L2 elasticnet using OWLQN optimizer.
3) Initial weights should be computed from prior probabilities. 
4) Ideally supports multinomial version in this PR. It will depend if ML api 
support multi-class classification. 


> Binary LogisticRegression with L1/L2 (elastic net) using OWLQN in new ML 
> package
> 
>
> Key: SPARK-7262
> URL: https://issues.apache.org/jira/browse/SPARK-7262
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: DB Tsai
>
> 1) Handle scaling and addBias internally. 
> 2) L1/L2 elasticnet using OWLQN optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7262) Binary LogisticRegression with L1/L2 (elastic net) using OWLQN in new ML package

2015-05-06 Thread DB Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai updated SPARK-7262:
---
Summary: Binary LogisticRegression with L1/L2 (elastic net) using OWLQN in 
new ML package  (was: LogisticRegression with L1/L2 (elastic net) using OWLQN 
in new ML package)

> Binary LogisticRegression with L1/L2 (elastic net) using OWLQN in new ML 
> package
> 
>
> Key: SPARK-7262
> URL: https://issues.apache.org/jira/browse/SPARK-7262
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: DB Tsai
>
> 1) Handle scaling and addBias internally. 
> 2) L1/L2 elasticnet using OWLQN optimizer.
> 3) Initial weights should be computed from prior probabilities. 
> 4) Ideally supports multinomial version in this PR. It will depend if ML api 
> support multi-class classification. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-7399) Master fails on 2.11 with compilation error

2015-05-06 Thread Tijo Thomas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tijo Thomas updated SPARK-7399:
---
Comment: was deleted

(was: Raised a pull request https://github.com/apache/spark/pull/5966)

> Master fails on 2.11 with compilation error
> ---
>
> Key: SPARK-7399
> URL: https://issues.apache.org/jira/browse/SPARK-7399
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Iulian Dragos
>
> The current code in master (and 1.4 branch) fails on 2.11 with the following 
> compilation error:
> {code}
> [error] /home/ubuntu/workspace/Apache Spark (master) on 
> 2.11/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala:78: in 
> object RDDOperationScope, multiple overloaded alternatives of method 
> withScope define default arguments.
> [error] private[spark] object RDDOperationScope {
> [error]   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7399) Master fails on 2.11 with compilation error

2015-05-06 Thread Tijo Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532062#comment-14532062
 ] 

Tijo Thomas commented on SPARK-7399:


Raised a pull request https://github.com/apache/spark/pull/5966

> Master fails on 2.11 with compilation error
> ---
>
> Key: SPARK-7399
> URL: https://issues.apache.org/jira/browse/SPARK-7399
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Iulian Dragos
>
> The current code in master (and 1.4 branch) fails on 2.11 with the following 
> compilation error:
> {code}
> [error] /home/ubuntu/workspace/Apache Spark (master) on 
> 2.11/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala:78: in 
> object RDDOperationScope, multiple overloaded alternatives of method 
> withScope define default arguments.
> [error] private[spark] object RDDOperationScope {
> [error]   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7399) Master fails on 2.11 with compilation error

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7399:
---

Assignee: (was: Apache Spark)

> Master fails on 2.11 with compilation error
> ---
>
> Key: SPARK-7399
> URL: https://issues.apache.org/jira/browse/SPARK-7399
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Iulian Dragos
>
> The current code in master (and 1.4 branch) fails on 2.11 with the following 
> compilation error:
> {code}
> [error] /home/ubuntu/workspace/Apache Spark (master) on 
> 2.11/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala:78: in 
> object RDDOperationScope, multiple overloaded alternatives of method 
> withScope define default arguments.
> [error] private[spark] object RDDOperationScope {
> [error]   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7399) Master fails on 2.11 with compilation error

2015-05-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532061#comment-14532061
 ] 

Apache Spark commented on SPARK-7399:
-

User 'tijoparacka' has created a pull request for this issue:
https://github.com/apache/spark/pull/5966

> Master fails on 2.11 with compilation error
> ---
>
> Key: SPARK-7399
> URL: https://issues.apache.org/jira/browse/SPARK-7399
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Iulian Dragos
>
> The current code in master (and 1.4 branch) fails on 2.11 with the following 
> compilation error:
> {code}
> [error] /home/ubuntu/workspace/Apache Spark (master) on 
> 2.11/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala:78: in 
> object RDDOperationScope, multiple overloaded alternatives of method 
> withScope define default arguments.
> [error] private[spark] object RDDOperationScope {
> [error]   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6812) filter() on DataFrame does not work as expected

2015-05-06 Thread Shivaram Venkataraman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-6812:
-
Fix Version/s: 1.4.0

> filter() on DataFrame does not work as expected
> ---
>
> Key: SPARK-6812
> URL: https://issues.apache.org/jira/browse/SPARK-6812
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Davies Liu
>Assignee: Sun Rui
>Priority: Blocker
> Fix For: 1.4.0
>
>
> {code}
> > filter(df, df$age > 21)
> Error in filter(df, df$age > 21) :
>   no method for coercing this S4 class to a vector
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7399) Master fails on 2.11 with compilation error

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7399:
---

Assignee: Apache Spark

> Master fails on 2.11 with compilation error
> ---
>
> Key: SPARK-7399
> URL: https://issues.apache.org/jira/browse/SPARK-7399
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Iulian Dragos
>Assignee: Apache Spark
>
> The current code in master (and 1.4 branch) fails on 2.11 with the following 
> compilation error:
> {code}
> [error] /home/ubuntu/workspace/Apache Spark (master) on 
> 2.11/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala:78: in 
> object RDDOperationScope, multiple overloaded alternatives of method 
> withScope define default arguments.
> [error] private[spark] object RDDOperationScope {
> [error]   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6812) filter() on DataFrame does not work as expected

2015-05-06 Thread Shivaram Venkataraman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-6812.
--
Resolution: Fixed

> filter() on DataFrame does not work as expected
> ---
>
> Key: SPARK-6812
> URL: https://issues.apache.org/jira/browse/SPARK-6812
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Davies Liu
>Assignee: Sun Rui
>Priority: Blocker
>
> {code}
> > filter(df, df$age > 21)
> Error in filter(df, df$age > 21) :
>   no method for coercing this S4 class to a vector
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6812) filter() on DataFrame does not work as expected

2015-05-06 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532057#comment-14532057
 ] 

Shivaram Venkataraman commented on SPARK-6812:
--

Fixed by https://github.com/apache/spark/pull/5938

> filter() on DataFrame does not work as expected
> ---
>
> Key: SPARK-6812
> URL: https://issues.apache.org/jira/browse/SPARK-6812
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Davies Liu
>Assignee: Sun Rui
>Priority: Blocker
>
> {code}
> > filter(df, df$age > 21)
> Error in filter(df, df$age > 21) :
>   no method for coercing this S4 class to a vector
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5874) How to improve the current ML pipeline API?

2015-05-06 Thread Eron Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532054#comment-14532054
 ] 

Eron Wright  commented on SPARK-5874:
-

I suggest providing a fluent syntax or dsl for pipeline assembly.

> How to improve the current ML pipeline API?
> ---
>
> Key: SPARK-5874
> URL: https://issues.apache.org/jira/browse/SPARK-5874
> Project: Spark
>  Issue Type: Brainstorming
>  Components: ML
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Critical
>
> I created this JIRA to collect feedbacks about the ML pipeline API we 
> introduced in Spark 1.2. The target is to graduate this set of APIs in 1.4 
> with confidence, which requires valuable input from the community. I'll 
> create sub-tasks for each major issue.
> Design doc (WIP): 
> https://docs.google.com/a/databricks.com/document/d/1plFBPJY_PriPTuMiFYLSm7fQgD1FieP4wt3oMVKMGcc/edit#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7433) How to pass the parameters to spark SQL backend and set its value to the environment variable through the simba ODBC driver.

2015-05-06 Thread vincent zhao (JIRA)
vincent zhao created SPARK-7433:
---

 Summary: How to pass the parameters to  spark SQL backend and set 
its value to the environment variable  through the simba ODBC driver.
 Key: SPARK-7433
 URL: https://issues.apache.org/jira/browse/SPARK-7433
 Project: Spark
  Issue Type: Question
  Components: Java API
Affects Versions: 1.3.0
Reporter: vincent zhao






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7393) How to improve Spark SQL performance?

2015-05-06 Thread Liang Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532044#comment-14532044
 ] 

Liang Lee commented on SPARK-7393:
--

Dear Dennis, Thank you very much for your kind help.
The data is loading from  HDFS which stores data on Samsung 840 Pro SSD. We use 
the following method to do the query and get the above results:

 val ds = sqlContext.parquetFile(databasepath + item + ".parquet")
ds.registerTempTable(item)
sqlContext.cacheTable(item)
var rs= sqlContext.sql("SELECT * FROM DBA WHERE CHROM=? AND POS=? ")
var rst= rs.collect()

The schema of the file is like :
 |-- CHROM: string (nullable = true)
 |-- POS: string (nullable = true)
 |-- ID: string (nullable = true)
 |-- REF: string (nullable = true)
 |-- ALT: string (nullable = true)
 |-- QUAL: string (nullable = true)
 |-- FILTER: string (nullable = true)
 |-- INFO: string (nullable = true)

Also, i"m trying your suggestion .But how to get the accurate query?
The statement  selection = df.where("CHROM=16")  returns error:
:22: error: type mismatch;
 found   : String("CHROM=\'16\'")
 required: org.apache.spark.sql.Column

How to write the expression?



> How to improve Spark SQL performance?
> -
>
> Key: SPARK-7393
> URL: https://issues.apache.org/jira/browse/SPARK-7393
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Liang Lee
>
> We want to use Spark SQL in our project ,but we found that the Spark SQL 
> performance is not very well as we expected. The detail is as follows:
>  1. We save data as parquet file on HDFS.
>  2.We just select one or several rows from the parquet file using spark SQL.
>  3. When the total record number is 61 million, it needs about 3 seconds to 
> get the result, which is unacceptable long for our scenario. 
> 4.When the total record number is 2 million, it needs about 93 ms to get the 
> result, whcih is still a little long for us.
>  5. The query statement is like : SELECT * FROM DBA WHERE COLA=? AND COLB=? 
> And the table is not complex, which has less 10 columns and the content for 
> each column is less than 100 bytes.
>  6. Does any one know how to improve the performance or give some other ideas?
>  7. Can Spark SQL support micro-second-level response? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7432) Flaky test in PySpark CrossValidator doc test

2015-05-06 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532017#comment-14532017
 ] 

Joseph K. Bradley commented on SPARK-7432:
--

It happened again: 
[https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32067/console]

> Flaky test in PySpark CrossValidator doc test
> -
>
> Key: SPARK-7432
> URL: https://issues.apache.org/jira/browse/SPARK-7432
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>Assignee: Xiangrui Meng
>Priority: Critical
>
> There was a test failure in the doc test in Python CrossValidator:
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32058/consoleFull]
> Here's the full doc test:
> {code}
> >>> from pyspark.ml.classification import LogisticRegression
> >>> from pyspark.ml.evaluation import BinaryClassificationEvaluator
> >>> from pyspark.mllib.linalg import Vectors
> >>> dataset = sqlContext.createDataFrame(
> ... [(Vectors.dense([0.0, 1.0]), 0.0),
> ...  (Vectors.dense([1.0, 2.0]), 1.0),
> ...  (Vectors.dense([0.55, 3.0]), 0.0),
> ...  (Vectors.dense([0.45, 4.0]), 1.0),
> ...  (Vectors.dense([0.51, 5.0]), 1.0)] * 10,
> ... ["features", "label"])
> >>> lr = LogisticRegression()
> >>> grid = ParamGridBuilder().addGrid(lr.maxIter, [0, 1, 5]).build()
> >>> evaluator = BinaryClassificationEvaluator()
> >>> cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, 
> evaluator=evaluator)
> >>> cvModel = cv.fit(dataset)
> >>> expected = lr.fit(dataset, {lr.maxIter: 5}).transform(dataset)
> >>> cvModel.transform(dataset).collect() == expected.collect()
> True
> {code}
> Here's the failure message:
> {code}
> Running test: pyspark/ml/tuning.py ... 
> **
> File "pyspark/ml/tuning.py", line 108, in __main__.CrossValidator
> Failed example:
> cvModel.transform(dataset).collect() == expected.collect()
> Expected:
> True
> Got:
> False
> **
>1 of  11 in __main__.CrossValidator
> ***Test Failed*** 1 failures.
> Had test failures; see logs.
> [error] Got a return code of 255 on line 240 of the run-tests script.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5213) Pluggable SQL Parser Support

2015-05-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531988#comment-14531988
 ] 

Apache Spark commented on SPARK-5213:
-

User 'chenghao-intel' has created a pull request for this issue:
https://github.com/apache/spark/pull/5965

> Pluggable SQL Parser Support
> 
>
> Key: SPARK-5213
> URL: https://issues.apache.org/jira/browse/SPARK-5213
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Cheng Hao
>Assignee: Cheng Hao
> Fix For: 1.4.0
>
>
> Currently, the SQL Parser dialect is hard code in SQLContext, which is not 
> easy to extend, we need the features like:
> bin/spark-sql --driver-class-path customizedSQL92.jar
> -- switch to "hiveql" dialect
>spark-sql>SET spark.sql.dialect=hiveql;
>spark-sql>SELECT * FROM src LIMIT 1;
> -- switch to "sql" dialect
>spark-sql>SET spark.sql.dialect=sql;
>spark-sql>SELECT * FROM src LIMIT 1;
> -- register the new SQL dialect
>spark-sql> SET spark.sql.dialect.sql99=com.xxx.xxx.SQL99Dialect;
>spark-sql> SET spark.sql.dialect=sql99;
>spark-sql> SELECT * FROM src LIMIT 1;
> -- register the non-exist SQL dialect
>spark-sql> SET spark.sql.dialect.sql92=NotExistedClass;
>spark-sql> SET spark.sql.dialect=sql92;
>spark-sql> SELECT * FROM src LIMIT 1;
> -- Exception will be thrown and switch to dialect "sql" (for SQLContext) or 
> "hiveql" (for HiveContext)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7308) Should there be multiple concurrent attempts for one stage?

2015-05-06 Thread Imran Rashid (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid updated SPARK-7308:

Description: 
Currently, when there is a fetch failure, you can end up with multiple 
concurrent attempts for the same stage.  Is this intended?  At best, it leads 
to some very confusing behavior, and it makes it hard for the user to make 
sense of what is going on.  At worst, I think this is cause of some very 
strange errors we've seen errors we've seen from users, where stages start 
executing before all the dependent stages have completed.

This can happen in the following scenario:  there is a fetch failure in attempt 
0, so the stage is retried.  attempt 1 starts.  But, tasks from attempt 0 are 
still running -- some of them can also hit fetch failures after attempt 1 
starts.  That will cause additional stage attempts to get fired up.

There is an attempt to handle this already 
https://github.com/apache/spark/blob/16860327286bc08b4e2283d51b4c8fe024ba5006/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1105

but that only checks whether the **stage** is running.  It really should check 
whether that **attempt** is still running, but there isn't enough info to do 
that.  

I'll also post some info on how to reproduce this.

  was:
Currently, when there is a fetch failure, you can end up with multiple 
concurrent attempts for the same stage.  Is this intended?  At best, it leads 
to some very confusing behavior, and it makes it hard for the user to make 
sense of what is going on.  At worst, I think this is cause of some very 
strange errors we've seen errors we've seen from users, where stages start 
executing before all the dependent stages have completed.

This can happen in the following scenario:  there is a fetch failure in attempt 
0, so the stage is retried.  attempt 1 starts.  But, tasks from attempt 0 are 
still running -- some of them can also hit fetch failures after attempt 1 
starts.  That will cause additional stage attempts to get fired up.

There is an attempt to handle this already 
https://github.com/apache/spark/blob/16860327286bc08b4e2283d51b4c8fe024ba5006/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1105

but that only checks whether the **stage** is running.  It really should check 
whether that **attempt** is still running, but there isn't enough info to do 
that.

Given the release timeline, I'm going to submit a PR to just fail fast as soon 
as we detect there are multiple concurrent attempts.  Would like some feedback 
from others on whether or not this is a good thing to do.  (The crazy thing is, 
when I reproduce this, spark seems to actually do the right thing despite the 
multiple attempts at the same stage, but I feel like that is probably dumb luck 
from what I've been testing.)

I'll also post some info on how to reproduce this.  Finally, if there really 
shouldn't be multiple concurrent attempts, then we can open another ticket for 
the proper fix (as opposed to just failiing fast) after the 1.4 release.


> Should there be multiple concurrent attempts for one stage?
> ---
>
> Key: SPARK-7308
> URL: https://issues.apache.org/jira/browse/SPARK-7308
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1
>Reporter: Imran Rashid
>Assignee: Imran Rashid
>
> Currently, when there is a fetch failure, you can end up with multiple 
> concurrent attempts for the same stage.  Is this intended?  At best, it leads 
> to some very confusing behavior, and it makes it hard for the user to make 
> sense of what is going on.  At worst, I think this is cause of some very 
> strange errors we've seen errors we've seen from users, where stages start 
> executing before all the dependent stages have completed.
> This can happen in the following scenario:  there is a fetch failure in 
> attempt 0, so the stage is retried.  attempt 1 starts.  But, tasks from 
> attempt 0 are still running -- some of them can also hit fetch failures after 
> attempt 1 starts.  That will cause additional stage attempts to get fired up.
> There is an attempt to handle this already 
> https://github.com/apache/spark/blob/16860327286bc08b4e2283d51b4c8fe024ba5006/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1105
> but that only checks whether the **stage** is running.  It really should 
> check whether that **attempt** is still running, but there isn't enough info 
> to do that.  
> I'll also post some info on how to reproduce this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7308) Should there be multiple concurrent attempts for one stage?

2015-05-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531969#comment-14531969
 ] 

Apache Spark commented on SPARK-7308:
-

User 'squito' has created a pull request for this issue:
https://github.com/apache/spark/pull/5964

> Should there be multiple concurrent attempts for one stage?
> ---
>
> Key: SPARK-7308
> URL: https://issues.apache.org/jira/browse/SPARK-7308
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1
>Reporter: Imran Rashid
>Assignee: Imran Rashid
>
> Currently, when there is a fetch failure, you can end up with multiple 
> concurrent attempts for the same stage.  Is this intended?  At best, it leads 
> to some very confusing behavior, and it makes it hard for the user to make 
> sense of what is going on.  At worst, I think this is cause of some very 
> strange errors we've seen errors we've seen from users, where stages start 
> executing before all the dependent stages have completed.
> This can happen in the following scenario:  there is a fetch failure in 
> attempt 0, so the stage is retried.  attempt 1 starts.  But, tasks from 
> attempt 0 are still running -- some of them can also hit fetch failures after 
> attempt 1 starts.  That will cause additional stage attempts to get fired up.
> There is an attempt to handle this already 
> https://github.com/apache/spark/blob/16860327286bc08b4e2283d51b4c8fe024ba5006/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1105
> but that only checks whether the **stage** is running.  It really should 
> check whether that **attempt** is still running, but there isn't enough info 
> to do that.
> Given the release timeline, I'm going to submit a PR to just fail fast as 
> soon as we detect there are multiple concurrent attempts.  Would like some 
> feedback from others on whether or not this is a good thing to do.  (The 
> crazy thing is, when I reproduce this, spark seems to actually do the right 
> thing despite the multiple attempts at the same stage, but I feel like that 
> is probably dumb luck from what I've been testing.)
> I'll also post some info on how to reproduce this.  Finally, if there really 
> shouldn't be multiple concurrent attempts, then we can open another ticket 
> for the proper fix (as opposed to just failiing fast) after the 1.4 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7411) CTAS parser is incomplete

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7411:
---

Assignee: Apache Spark  (was: Cheng Hao)

> CTAS parser is incomplete
> -
>
> Key: SPARK-7411
> URL: https://issues.apache.org/jira/browse/SPARK-7411
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
>Reporter: Michael Armbrust
>Assignee: Apache Spark
>Priority: Blocker
>
> The change to use an isolated classloader removed the use of the Semantic 
> Analyzer for parsing CTAS queries.  We should fix this before the release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7411) CTAS parser is incomplete

2015-05-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531967#comment-14531967
 ] 

Apache Spark commented on SPARK-7411:
-

User 'chenghao-intel' has created a pull request for this issue:
https://github.com/apache/spark/pull/5963

> CTAS parser is incomplete
> -
>
> Key: SPARK-7411
> URL: https://issues.apache.org/jira/browse/SPARK-7411
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
>Reporter: Michael Armbrust
>Assignee: Cheng Hao
>Priority: Blocker
>
> The change to use an isolated classloader removed the use of the Semantic 
> Analyzer for parsing CTAS queries.  We should fix this before the release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7411) CTAS parser is incomplete

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7411:
---

Assignee: Cheng Hao  (was: Apache Spark)

> CTAS parser is incomplete
> -
>
> Key: SPARK-7411
> URL: https://issues.apache.org/jira/browse/SPARK-7411
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
>Reporter: Michael Armbrust
>Assignee: Cheng Hao
>Priority: Blocker
>
> The change to use an isolated classloader removed the use of the Semantic 
> Analyzer for parsing CTAS queries.  We should fix this before the release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-7275) Make LogicalRelation public

2015-05-06 Thread Glenn Weidner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526799#comment-14526799
 ] 

Glenn Weidner edited comment on SPARK-7275 at 5/7/15 4:18 AM:
--

[~smolav] Can you provide example of where being private makes it more 
difficult "to work with full logical plans from third party packages"?  Thank 
you.


was (Author: gweidner):
Santiago M. Mola - can you provide example of where being private makes it more 
difficult "to work with full logical plans from third party packages"?  Thank 
you.

> Make LogicalRelation public
> ---
>
> Key: SPARK-7275
> URL: https://issues.apache.org/jira/browse/SPARK-7275
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Santiago M. Mola
>Priority: Minor
>
> It seems LogicalRelation is the only part of the LogicalPlan that is not 
> public. This makes it harder to work with full logical plans from third party 
> packages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2015-05-06 Thread meiyoula (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531945#comment-14531945
 ] 

meiyoula commented on SPARK-1867:
-

I have resolved my problem. 
Actually, the primary cause is ClassNotFoundException. When I add the 
dependency jars into executor classpath, everything is ok.

> Spark Documentation Error causes java.lang.IllegalStateException: unread 
> block data
> ---
>
> Key: SPARK-1867
> URL: https://issues.apache.org/jira/browse/SPARK-1867
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: sam
>
> I've employed two System Administrators on a contract basis (for quite a bit 
> of money), and both contractors have independently hit the following 
> exception.  What we are doing is:
> 1. Installing Spark 0.9.1 according to the documentation on the website, 
> along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs.
> 2. Building a fat jar with a Spark app with sbt then trying to run it on the 
> cluster
> I've also included code snippets, and sbt deps at the bottom.
> When I've Googled this, there seems to be two somewhat vague responses:
> a) Mismatching spark versions on nodes/user code
> b) Need to add more jars to the SparkConf
> Now I know that (b) is not the problem having successfully run the same code 
> on other clusters while only including one jar (it's a fat jar).
> But I have no idea how to check for (a) - it appears Spark doesn't have any 
> version checks or anything - it would be nice if it checked versions and 
> threw a "mismatching version exception: you have user code using version X 
> and node Y has version Z".
> I would be very grateful for advice on this.
> The exception:
> Exception in thread "main" org.apache.spark.SparkException: Job aborted: Task 
> 0.0:1 failed 32 times (most recent failure: Exception failure: 
> java.lang.IllegalStateException: unread block data)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to 
> java.lang.IllegalStateException: unread block data [duplicate 59]
> My code snippet:
> val conf = new SparkConf()
>.setMaster(clusterMaster)
>.setAppName(appName)
>.setSparkHome(sparkHome)
>.setJars(SparkContext.jarOfClass(this.getClass))
> println("count = " + new SparkContext(conf).textFile(someHdfsPath).count())
> My SBT dependencies:
> // relevant
> "org.apache.spark" % "spark-core_2.10" % "0.9.1",
> "org.apache.hadoop" % "hadoop-client" % "2.3.0-mr1-cdh5.0.0",
> // standard, probably unrelated
> "com.github.seratch" %% "awscala" % "[0.2,)",
> "org.scalacheck" %% "scalacheck" % "1.10.1" % "test",
> "org.specs2" %% "specs2" % "1.14" % "test",
> "org.scala-lang" % "scala-reflect" % "2.10.3",
> "org.scalaz" %% "scalaz-core" % "7.0.5",
> "net.minidev" % "json-smart" % "1.2"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-

[jira] [Commented] (SPARK-7335) Submitting a query to Thrift Server occurs error: java.lang.IllegalStateException: unread block data

2015-05-06 Thread meiyoula (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531944#comment-14531944
 ] 

meiyoula commented on SPARK-7335:
-

I have resolved my problem. 
Actually, the primary cause is ClassNotFoundException. When I add the 
dependency jars into executor classpath, everything is ok.

> Submitting a query to Thrift Server occurs error: 
> java.lang.IllegalStateException: unread block data
> 
>
> Key: SPARK-7335
> URL: https://issues.apache.org/jira/browse/SPARK-7335
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: meiyoula
>Priority: Critical
>
> java.lang.IllegalStateException: unread block data
> at 
> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
> at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:163)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7335) Submitting a query to Thrift Server occurs error: java.lang.IllegalStateException: unread block data

2015-05-06 Thread meiyoula (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

meiyoula resolved SPARK-7335.
-
Resolution: Not A Problem

> Submitting a query to Thrift Server occurs error: 
> java.lang.IllegalStateException: unread block data
> 
>
> Key: SPARK-7335
> URL: https://issues.apache.org/jira/browse/SPARK-7335
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: meiyoula
>Priority: Critical
>
> java.lang.IllegalStateException: unread block data
> at 
> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
> at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:163)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7432) Flaky test in PySpark CrossValidator doc test

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7432:
---

Assignee: Apache Spark  (was: Xiangrui Meng)

> Flaky test in PySpark CrossValidator doc test
> -
>
> Key: SPARK-7432
> URL: https://issues.apache.org/jira/browse/SPARK-7432
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>Priority: Critical
>
> There was a test failure in the doc test in Python CrossValidator:
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32058/consoleFull]
> Here's the full doc test:
> {code}
> >>> from pyspark.ml.classification import LogisticRegression
> >>> from pyspark.ml.evaluation import BinaryClassificationEvaluator
> >>> from pyspark.mllib.linalg import Vectors
> >>> dataset = sqlContext.createDataFrame(
> ... [(Vectors.dense([0.0, 1.0]), 0.0),
> ...  (Vectors.dense([1.0, 2.0]), 1.0),
> ...  (Vectors.dense([0.55, 3.0]), 0.0),
> ...  (Vectors.dense([0.45, 4.0]), 1.0),
> ...  (Vectors.dense([0.51, 5.0]), 1.0)] * 10,
> ... ["features", "label"])
> >>> lr = LogisticRegression()
> >>> grid = ParamGridBuilder().addGrid(lr.maxIter, [0, 1, 5]).build()
> >>> evaluator = BinaryClassificationEvaluator()
> >>> cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, 
> evaluator=evaluator)
> >>> cvModel = cv.fit(dataset)
> >>> expected = lr.fit(dataset, {lr.maxIter: 5}).transform(dataset)
> >>> cvModel.transform(dataset).collect() == expected.collect()
> True
> {code}
> Here's the failure message:
> {code}
> Running test: pyspark/ml/tuning.py ... 
> **
> File "pyspark/ml/tuning.py", line 108, in __main__.CrossValidator
> Failed example:
> cvModel.transform(dataset).collect() == expected.collect()
> Expected:
> True
> Got:
> False
> **
>1 of  11 in __main__.CrossValidator
> ***Test Failed*** 1 failures.
> Had test failures; see logs.
> [error] Got a return code of 255 on line 240 of the run-tests script.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7432) Flaky test in PySpark CrossValidator doc test

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7432:
---

Assignee: Xiangrui Meng  (was: Apache Spark)

> Flaky test in PySpark CrossValidator doc test
> -
>
> Key: SPARK-7432
> URL: https://issues.apache.org/jira/browse/SPARK-7432
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>Assignee: Xiangrui Meng
>Priority: Critical
>
> There was a test failure in the doc test in Python CrossValidator:
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32058/consoleFull]
> Here's the full doc test:
> {code}
> >>> from pyspark.ml.classification import LogisticRegression
> >>> from pyspark.ml.evaluation import BinaryClassificationEvaluator
> >>> from pyspark.mllib.linalg import Vectors
> >>> dataset = sqlContext.createDataFrame(
> ... [(Vectors.dense([0.0, 1.0]), 0.0),
> ...  (Vectors.dense([1.0, 2.0]), 1.0),
> ...  (Vectors.dense([0.55, 3.0]), 0.0),
> ...  (Vectors.dense([0.45, 4.0]), 1.0),
> ...  (Vectors.dense([0.51, 5.0]), 1.0)] * 10,
> ... ["features", "label"])
> >>> lr = LogisticRegression()
> >>> grid = ParamGridBuilder().addGrid(lr.maxIter, [0, 1, 5]).build()
> >>> evaluator = BinaryClassificationEvaluator()
> >>> cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, 
> evaluator=evaluator)
> >>> cvModel = cv.fit(dataset)
> >>> expected = lr.fit(dataset, {lr.maxIter: 5}).transform(dataset)
> >>> cvModel.transform(dataset).collect() == expected.collect()
> True
> {code}
> Here's the failure message:
> {code}
> Running test: pyspark/ml/tuning.py ... 
> **
> File "pyspark/ml/tuning.py", line 108, in __main__.CrossValidator
> Failed example:
> cvModel.transform(dataset).collect() == expected.collect()
> Expected:
> True
> Got:
> False
> **
>1 of  11 in __main__.CrossValidator
> ***Test Failed*** 1 failures.
> Had test failures; see logs.
> [error] Got a return code of 255 on line 240 of the run-tests script.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7432) Flaky test in PySpark CrossValidator doc test

2015-05-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531934#comment-14531934
 ] 

Apache Spark commented on SPARK-7432:
-

User 'mengxr' has created a pull request for this issue:
https://github.com/apache/spark/pull/5962

> Flaky test in PySpark CrossValidator doc test
> -
>
> Key: SPARK-7432
> URL: https://issues.apache.org/jira/browse/SPARK-7432
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>Assignee: Xiangrui Meng
>Priority: Critical
>
> There was a test failure in the doc test in Python CrossValidator:
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32058/consoleFull]
> Here's the full doc test:
> {code}
> >>> from pyspark.ml.classification import LogisticRegression
> >>> from pyspark.ml.evaluation import BinaryClassificationEvaluator
> >>> from pyspark.mllib.linalg import Vectors
> >>> dataset = sqlContext.createDataFrame(
> ... [(Vectors.dense([0.0, 1.0]), 0.0),
> ...  (Vectors.dense([1.0, 2.0]), 1.0),
> ...  (Vectors.dense([0.55, 3.0]), 0.0),
> ...  (Vectors.dense([0.45, 4.0]), 1.0),
> ...  (Vectors.dense([0.51, 5.0]), 1.0)] * 10,
> ... ["features", "label"])
> >>> lr = LogisticRegression()
> >>> grid = ParamGridBuilder().addGrid(lr.maxIter, [0, 1, 5]).build()
> >>> evaluator = BinaryClassificationEvaluator()
> >>> cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, 
> evaluator=evaluator)
> >>> cvModel = cv.fit(dataset)
> >>> expected = lr.fit(dataset, {lr.maxIter: 5}).transform(dataset)
> >>> cvModel.transform(dataset).collect() == expected.collect()
> True
> {code}
> Here's the failure message:
> {code}
> Running test: pyspark/ml/tuning.py ... 
> **
> File "pyspark/ml/tuning.py", line 108, in __main__.CrossValidator
> Failed example:
> cvModel.transform(dataset).collect() == expected.collect()
> Expected:
> True
> Got:
> False
> **
>1 of  11 in __main__.CrossValidator
> ***Test Failed*** 1 failures.
> Had test failures; see logs.
> [error] Got a return code of 255 on line 240 of the run-tests script.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-7008) An implementation of Factorization Machine (LibFM)

2015-05-06 Thread Guoqiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li reopened SPARK-7008:


This jira should not be closed..

> An implementation of Factorization Machine (LibFM)
> --
>
> Key: SPARK-7008
> URL: https://issues.apache.org/jira/browse/SPARK-7008
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 1.3.0, 1.3.1, 1.3.2
>Reporter: zhengruifeng
>  Labels: features, patch
> Attachments: FM_CR.xlsx, FM_convergence_rate.xlsx, QQ20150421-1.png, 
> QQ20150421-2.png
>
>
> An implementation of Factorization Machines based on Scala and Spark MLlib.
> FM is a kind of machine learning algorithm for multi-linear regression, and 
> is widely used for recommendation.
> FM works well in recent years' recommendation competitions.
> Ref:
> http://libfm.org/
> http://doi.acm.org/10.1145/2168752.2168771
> http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7431) cvModel does not have uid in Python doc test

2015-05-06 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531929#comment-14531929
 ] 

Joseph K. Bradley commented on SPARK-7431:
--

I'm working on this

> cvModel does not have uid in Python doc test
> 
>
> Key: SPARK-7431
> URL: https://issues.apache.org/jira/browse/SPARK-7431
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>Priority: Critical
>
> Try running the CrossValidator doc test in the pyspark shell.  Then type 
> cvModel to print the model.  It will fail in {{Identifiable.__repr__}} since 
> there is no uid defined!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7432) Flaky test in PySpark CrossValidator doc test

2015-05-06 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-7432:
-
Assignee: Xiangrui Meng

> Flaky test in PySpark CrossValidator doc test
> -
>
> Key: SPARK-7432
> URL: https://issues.apache.org/jira/browse/SPARK-7432
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>Assignee: Xiangrui Meng
>Priority: Critical
>
> There was a test failure in the doc test in Python CrossValidator:
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32058/consoleFull]
> Here's the full doc test:
> {code}
> >>> from pyspark.ml.classification import LogisticRegression
> >>> from pyspark.ml.evaluation import BinaryClassificationEvaluator
> >>> from pyspark.mllib.linalg import Vectors
> >>> dataset = sqlContext.createDataFrame(
> ... [(Vectors.dense([0.0, 1.0]), 0.0),
> ...  (Vectors.dense([1.0, 2.0]), 1.0),
> ...  (Vectors.dense([0.55, 3.0]), 0.0),
> ...  (Vectors.dense([0.45, 4.0]), 1.0),
> ...  (Vectors.dense([0.51, 5.0]), 1.0)] * 10,
> ... ["features", "label"])
> >>> lr = LogisticRegression()
> >>> grid = ParamGridBuilder().addGrid(lr.maxIter, [0, 1, 5]).build()
> >>> evaluator = BinaryClassificationEvaluator()
> >>> cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, 
> evaluator=evaluator)
> >>> cvModel = cv.fit(dataset)
> >>> expected = lr.fit(dataset, {lr.maxIter: 5}).transform(dataset)
> >>> cvModel.transform(dataset).collect() == expected.collect()
> True
> {code}
> Here's the failure message:
> {code}
> Running test: pyspark/ml/tuning.py ... 
> **
> File "pyspark/ml/tuning.py", line 108, in __main__.CrossValidator
> Failed example:
> cvModel.transform(dataset).collect() == expected.collect()
> Expected:
> True
> Got:
> False
> **
>1 of  11 in __main__.CrossValidator
> ***Test Failed*** 1 failures.
> Had test failures; see logs.
> [error] Got a return code of 255 on line 240 of the run-tests script.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7432) Flaky test in PySpark CrossValidator doc test

2015-05-06 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-7432:


 Summary: Flaky test in PySpark CrossValidator doc test
 Key: SPARK-7432
 URL: https://issues.apache.org/jira/browse/SPARK-7432
 Project: Spark
  Issue Type: Bug
  Components: ML, PySpark
Affects Versions: 1.4.0
Reporter: Joseph K. Bradley
Priority: Critical


There was a test failure in the doc test in Python CrossValidator:
[https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32058/consoleFull]

Here's the full doc test:
{code}
>>> from pyspark.ml.classification import LogisticRegression
>>> from pyspark.ml.evaluation import BinaryClassificationEvaluator
>>> from pyspark.mllib.linalg import Vectors
>>> dataset = sqlContext.createDataFrame(
... [(Vectors.dense([0.0, 1.0]), 0.0),
...  (Vectors.dense([1.0, 2.0]), 1.0),
...  (Vectors.dense([0.55, 3.0]), 0.0),
...  (Vectors.dense([0.45, 4.0]), 1.0),
...  (Vectors.dense([0.51, 5.0]), 1.0)] * 10,
... ["features", "label"])
>>> lr = LogisticRegression()
>>> grid = ParamGridBuilder().addGrid(lr.maxIter, [0, 1, 5]).build()
>>> evaluator = BinaryClassificationEvaluator()
>>> cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, 
evaluator=evaluator)
>>> cvModel = cv.fit(dataset)
>>> expected = lr.fit(dataset, {lr.maxIter: 5}).transform(dataset)
>>> cvModel.transform(dataset).collect() == expected.collect()
True
{code}

Here's the failure message:
{code}
Running test: pyspark/ml/tuning.py ... 
**
File "pyspark/ml/tuning.py", line 108, in __main__.CrossValidator
Failed example:
cvModel.transform(dataset).collect() == expected.collect()
Expected:
True
Got:
False
**
   1 of  11 in __main__.CrossValidator
***Test Failed*** 1 failures.
Had test failures; see logs.
[error] Got a return code of 255 on line 240 of the run-tests script.
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7431) cvModel does not have uid in Python doc test

2015-05-06 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-7431:


 Summary: cvModel does not have uid in Python doc test
 Key: SPARK-7431
 URL: https://issues.apache.org/jira/browse/SPARK-7431
 Project: Spark
  Issue Type: Bug
  Components: ML, PySpark
Affects Versions: 1.4.0
Reporter: Joseph K. Bradley
Priority: Critical


Try running the CrossValidator doc test in the pyspark shell.  Then type 
cvModel to print the model.  It will fail in {{Identifiable.__repr__}} since 
there is no uid defined!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-7008) An implementation of Factorization Machine (LibFM)

2015-05-06 Thread zhengruifeng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng closed SPARK-7008.
---
Resolution: Fixed

> An implementation of Factorization Machine (LibFM)
> --
>
> Key: SPARK-7008
> URL: https://issues.apache.org/jira/browse/SPARK-7008
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 1.3.0, 1.3.1, 1.3.2
>Reporter: zhengruifeng
>  Labels: features, patch
> Attachments: FM_CR.xlsx, FM_convergence_rate.xlsx, QQ20150421-1.png, 
> QQ20150421-2.png
>
>
> An implementation of Factorization Machines based on Scala and Spark MLlib.
> FM is a kind of machine learning algorithm for multi-linear regression, and 
> is widely used for recommendation.
> FM works well in recent years' recommendation competitions.
> Ref:
> http://libfm.org/
> http://doi.acm.org/10.1145/2168752.2168771
> http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7430) General improvements to streaming tests to increase debuggability

2015-05-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531906#comment-14531906
 ] 

Apache Spark commented on SPARK-7430:
-

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/5961

> General improvements to streaming tests to increase debuggability
> -
>
> Key: SPARK-7430
> URL: https://issues.apache.org/jira/browse/SPARK-7430
> Project: Spark
>  Issue Type: Test
>  Components: Streaming, Tests
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7430) General improvements to streaming tests to increase debuggability

2015-05-06 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-7430:
-
Priority: Critical  (was: Major)

> General improvements to streaming tests to increase debuggability
> -
>
> Key: SPARK-7430
> URL: https://issues.apache.org/jira/browse/SPARK-7430
> Project: Spark
>  Issue Type: Test
>  Components: Streaming, Tests
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7430) General improvements to streaming tests to increase debuggability

2015-05-06 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-7430:


 Summary: General improvements to streaming tests to increase 
debuggability
 Key: SPARK-7430
 URL: https://issues.apache.org/jira/browse/SPARK-7430
 Project: Spark
  Issue Type: Test
  Components: Streaming, Tests
Reporter: Tathagata Das
Assignee: Tathagata Das






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7407) Use uid and param name to identify a parameter instead of the param object

2015-05-06 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531898#comment-14531898
 ] 

Joseph K. Bradley commented on SPARK-7407:
--

I hope we can make this change without changing the user-facing API.  That 
seems very doable for Scala, where ParamMap is a class.  It sounds harder for 
Python.  Should we make it a class there too?

> Use uid and param name to identify a parameter instead of the param object
> --
>
> Key: SPARK-7407
> URL: https://issues.apache.org/jira/browse/SPARK-7407
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> Transferring parameter values from one to another have been the pain point in 
> the ML pipeline implementation. Because we use the param object as the key in 
> the param map, we have to correctly copy them when making a copy of the 
> transformer, estimator, and models. This becomes complicated when 
> meta-algorithms are involved. For example, in cross validation:
> {code}
> val cv = new CrossValidator()
>   .setEstimator(lr)
>   .setEstimatorParamMaps(epm)
> {code}
> When we make a copy of `cv` with extra params that contain estimator params,
> {code}
> cv.copy(ParamMap(cv.numFolds -> 3, lr.maxIter -> 10))
> {code}
> we need to make a copy of the `lr` object as well and map `epm` to use the 
> new param keys from the old `lr`. This is quite error-prone, especially if 
> the estimator itself is another meta-algorithm.
> Using uid + param name as the key in param maps and using the same uid in 
> copy (and between estimator/model pairs) would simplify the implementations. 
> We don't need to change the keys since the copied instance has the same id as 
> the original instance. And it is easier to find models from a fitted pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7429) Cleanups: Params.setDefault varargs, CrossValidatorModel transformSchema

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7429:
---

Assignee: Joseph K. Bradley  (was: Apache Spark)

> Cleanups: Params.setDefault varargs, CrossValidatorModel transformSchema
> 
>
> Key: SPARK-7429
> URL: https://issues.apache.org/jira/browse/SPARK-7429
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Minor
>
> Params.setDefault taking a set of ParamPairs should be annotated with 
> varargs.  I thought it would not work before, but it apparently does.
> CrossValidator.transform should call transformSchema since the underlying 
> Model might be a PipelineModel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7429) Cleanups: Params.setDefault varargs, CrossValidatorModel transformSchema

2015-05-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531893#comment-14531893
 ] 

Apache Spark commented on SPARK-7429:
-

User 'jkbradley' has created a pull request for this issue:
https://github.com/apache/spark/pull/5960

> Cleanups: Params.setDefault varargs, CrossValidatorModel transformSchema
> 
>
> Key: SPARK-7429
> URL: https://issues.apache.org/jira/browse/SPARK-7429
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Minor
>
> Params.setDefault taking a set of ParamPairs should be annotated with 
> varargs.  I thought it would not work before, but it apparently does.
> CrossValidator.transform should call transformSchema since the underlying 
> Model might be a PipelineModel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7429) Cleanups: Params.setDefault varargs, CrossValidatorModel transformSchema

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7429:
---

Assignee: Apache Spark  (was: Joseph K. Bradley)

> Cleanups: Params.setDefault varargs, CrossValidatorModel transformSchema
> 
>
> Key: SPARK-7429
> URL: https://issues.apache.org/jira/browse/SPARK-7429
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>Priority: Minor
>
> Params.setDefault taking a set of ParamPairs should be annotated with 
> varargs.  I thought it would not work before, but it apparently does.
> CrossValidator.transform should call transformSchema since the underlying 
> Model might be a PipelineModel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7429) Cleanups: Params.setDefault varargs, CrossValidatorModel transformSchema

2015-05-06 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-7429:


 Summary: Cleanups: Params.setDefault varargs, CrossValidatorModel 
transformSchema
 Key: SPARK-7429
 URL: https://issues.apache.org/jira/browse/SPARK-7429
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Joseph K. Bradley
Assignee: Joseph K. Bradley
Priority: Minor


Params.setDefault taking a set of ParamPairs should be annotated with varargs.  
I thought it would not work before, but it apparently does.

CrossValidator.transform should call transformSchema since the underlying Model 
might be a PipelineModel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7411) CTAS parser is incomplete

2015-05-06 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-7411:

Assignee: Cheng Hao

> CTAS parser is incomplete
> -
>
> Key: SPARK-7411
> URL: https://issues.apache.org/jira/browse/SPARK-7411
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
>Reporter: Michael Armbrust
>Assignee: Cheng Hao
>Priority: Blocker
>
> The change to use an isolated classloader removed the use of the Semantic 
> Analyzer for parsing CTAS queries.  We should fix this before the release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7428) DataFrame.join() could create a new df with duplicate column name

2015-05-06 Thread yan tianxing (JIRA)
yan tianxing created SPARK-7428:
---

 Summary: DataFrame.join() could create a new df with duplicate 
column name
 Key: SPARK-7428
 URL: https://issues.apache.org/jira/browse/SPARK-7428
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
 Environment: spark-1.3.0-bin-hadoop2.4
Reporter: yan tianxing


>val df = sc.parallelize(Array(1,2,3)).toDF("x")
>val df2 = sc.parallelize(Array(1,4,5)).toDF("x")
>val df3 = df.join(df2,df("x")===df2("x"),"inner")
>df3.show
x x
1 1

> df3.select("x")
org.apache.spark.sql.AnalysisException: Ambiguous references to x: 
(x#1,List()),(x#3,List());
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:211)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:109)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$7$$anonfun$applyOrElse$2.applyOrElse(Analyzer.scala:267)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$7$$anonfun$applyOrElse$2.applyOrElse(Analyzer.scala:260)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:250)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:250)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:50)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:249)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1(QueryPlan.scala:103)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2$$anonfun$apply$2.apply(QueryPlan.scala:117)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:116)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:121)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$7.applyOrElse(Analyzer.scala:260)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$7.applyOrElse(Analyzer.scala:197)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:250)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:250)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:50)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:249)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.apply(Analyzer.scala:197)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.apply(Analyzer.scala:196)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$ano

[jira] [Updated] (SPARK-6943) Graphically show the RDD DAG on the UI

2015-05-06 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6943:
-
Attachment: new-stage-page-5-6-15.png
new-job-page-5-6-15.png

> Graphically show the RDD DAG on the UI
> --
>
> Key: SPARK-6943
> URL: https://issues.apache.org/jira/browse/SPARK-6943
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Reporter: Patrick Wendell
>Assignee: Andrew Or
> Fix For: 1.4.0
>
> Attachments: DAGvisualizationintheSparkWebUI.pdf, job-page.png, 
> new-job-page-5-6-15.png, new-stage-page-5-6-15.png, stage-page.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7217) Add configuration to control the default behavior of StreamingContext.stop() implicitly calling SparkContext.stop()

2015-05-06 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-7217:
-
Priority: Blocker  (was: Major)
Target Version/s: 1.4.0

> Add configuration to control the default behavior of StreamingContext.stop() 
> implicitly calling SparkContext.stop()
> ---
>
> Key: SPARK-7217
> URL: https://issues.apache.org/jira/browse/SPARK-7217
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.3.1
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Blocker
>
> In environments like notebooks, the SparkContext is managed by the underlying 
> infrastructure and it is expected that the SparkContext will not be stopped. 
> However, StreamingContext.stop() calls SparkContext.stop() as a non-intuitive 
> side-effect. This JIRA is to add a configuration in SparkConf that sets the 
> default StreamingContext stop behavior. It should be such that the existing 
> behavior does not change for existing users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6656) Allow the application name to be passed in versus pulling from SparkContext.getAppName()

2015-05-06 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-6656:
-
Assignee: Chris Fregly

> Allow the application name to be passed in versus pulling from 
> SparkContext.getAppName() 
> -
>
> Key: SPARK-6656
> URL: https://issues.apache.org/jira/browse/SPARK-6656
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.1.0
>Reporter: Chris Fregly
>Assignee: Chris Fregly
>
> this is useful for the scenario where Kinesis Spark Streaming is being 
> invoked from the Spark Shell.  in this case, the application name in the 
> SparkContext is pre-set to "Spark Shell".
> this isn't a common or recommended use case, but it's best to make this 
> configurable outside of SparkContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7427) Make sharedParams match in Scala, Python

2015-05-06 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-7427:
-
Description: The documentation for shared Params differs a little between 
Scala, Python.  The Python docs should be modified to match the Scala ones.  
This will require modifying the sharedParamsCodeGen files.  (was: The 
documentation for shared Params differs a little between Scala, Python.  The 
Python docs should be modified to match the Scala ones.)

> Make sharedParams match in Scala, Python
> 
>
> Key: SPARK-7427
> URL: https://issues.apache.org/jira/browse/SPARK-7427
> Project: Spark
>  Issue Type: Documentation
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>
> The documentation for shared Params differs a little between Scala, Python.  
> The Python docs should be modified to match the Scala ones.  This will 
> require modifying the sharedParamsCodeGen files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7391) DAG visualization: open viz on stage page if from job page

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7391:
---

Assignee: Apache Spark  (was: Andrew Or)

> DAG visualization: open viz on stage page if from job page
> --
>
> Key: SPARK-7391
> URL: https://issues.apache.org/jira/browse/SPARK-7391
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Apache Spark
>Priority: Minor
>
> Right now we can click from the job page to the stage page. But as soon as 
> you get to the stage page, you will have to open the viz manually again. This 
> is annoying for users (like me) who expect that clicking from the job page 
> would expand the stage DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7427) Make sharedParams match in Scala, Python

2015-05-06 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-7427:
-
Issue Type: Documentation  (was: Improvement)

> Make sharedParams match in Scala, Python
> 
>
> Key: SPARK-7427
> URL: https://issues.apache.org/jira/browse/SPARK-7427
> Project: Spark
>  Issue Type: Documentation
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>
> The documentation for shared Params differs a little between Scala, Python.  
> The Python docs should be modified to match the Scala ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7391) DAG visualization: open viz on stage page if from job page

2015-05-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531876#comment-14531876
 ] 

Apache Spark commented on SPARK-7391:
-

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/5958

> DAG visualization: open viz on stage page if from job page
> --
>
> Key: SPARK-7391
> URL: https://issues.apache.org/jira/browse/SPARK-7391
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
>
> Right now we can click from the job page to the stage page. But as soon as 
> you get to the stage page, you will have to open the viz manually again. This 
> is annoying for users (like me) who expect that clicking from the job page 
> would expand the stage DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7427) Make sharedParams match in Scala, Python

2015-05-06 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-7427:
-
Labels: starter  (was: )

> Make sharedParams match in Scala, Python
> 
>
> Key: SPARK-7427
> URL: https://issues.apache.org/jira/browse/SPARK-7427
> Project: Spark
>  Issue Type: Documentation
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>Priority: Trivial
>  Labels: starter
>
> The documentation for shared Params differs a little between Scala, Python.  
> The Python docs should be modified to match the Scala ones.  This will 
> require modifying the sharedParamsCodeGen files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7391) DAG visualization: open viz on stage page if from job page

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7391:
---

Assignee: Andrew Or  (was: Apache Spark)

> DAG visualization: open viz on stage page if from job page
> --
>
> Key: SPARK-7391
> URL: https://issues.apache.org/jira/browse/SPARK-7391
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
>
> Right now we can click from the job page to the stage page. But as soon as 
> you get to the stage page, you will have to open the viz manually again. This 
> is annoying for users (like me) who expect that clicking from the job page 
> would expand the stage DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7427) Make sharedParams match in Scala, Python

2015-05-06 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-7427:


 Summary: Make sharedParams match in Scala, Python
 Key: SPARK-7427
 URL: https://issues.apache.org/jira/browse/SPARK-7427
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Joseph K. Bradley
Priority: Trivial


The documentation for shared Params differs a little between Scala, Python.  
The Python docs should be modified to match the Scala ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7424) spark.ml classification, regression abstractions should add metadata to output column

2015-05-06 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531875#comment-14531875
 ] 

Joseph K. Bradley commented on SPARK-7424:
--

I started a little work on this.  It will involve modifying 
PredictorParams.validateAndTransformSchema to copy metadata from the labelCol 
to the outputCol, if available.  It should not of course copy the column name.
The PredictionModel will need to store the labelCol attribute, if available.  
This may require modifying subclasses.  It may also require specializing 
validateAndTransformSchema for Predictor and PredictionModel (making 2 
versions).

> spark.ml classification, regression abstractions should add metadata to 
> output column
> -
>
> Key: SPARK-7424
> URL: https://issues.apache.org/jira/browse/SPARK-7424
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> Update ClassificationModel, ProbabilisticClassificationModel prediction to 
> include numClasses in output column metadata.
> Update RegressionModel to specify output column metadata as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7424) spark.ml classification, regression abstractions should add metadata to output column

2015-05-06 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-7424:
-
Assignee: (was: Joseph K. Bradley)

> spark.ml classification, regression abstractions should add metadata to 
> output column
> -
>
> Key: SPARK-7424
> URL: https://issues.apache.org/jira/browse/SPARK-7424
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> Update ClassificationModel, ProbabilisticClassificationModel prediction to 
> include numClasses in output column metadata.
> Update RegressionModel to specify output column metadata as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7284) Update streaming documentation for Spark 1.4.0 release

2015-05-06 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-7284:
-
Description: 
Things to update (continuously updated list)
- Python API for Kafka Direct
- Pointers to the new Streaming UI
- Update Kafka version to 0.8.2.1
- Add ref to RDD.foreachPartitionWithIndex (if merged)


  was:
Things to update (continuously updated list)
- Python API for Kafka Direct
- Pointers to the new Streaming UI
- Update Kafka version to 0.8.2.1



> Update streaming documentation for Spark 1.4.0 release
> --
>
> Key: SPARK-7284
> URL: https://issues.apache.org/jira/browse/SPARK-7284
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Blocker
>
> Things to update (continuously updated list)
> - Python API for Kafka Direct
> - Pointers to the new Streaming UI
> - Update Kafka version to 0.8.2.1
> - Add ref to RDD.foreachPartitionWithIndex (if merged)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7347) DAG visualization: add tooltips to RDDs on job page

2015-05-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531866#comment-14531866
 ] 

Apache Spark commented on SPARK-7347:
-

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/5957

> DAG visualization: add tooltips to RDDs on job page
> ---
>
> Key: SPARK-7347
> URL: https://issues.apache.org/jira/browse/SPARK-7347
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
> Attachments: tooltip.png
>
>
> Currently it's just a bunch of dots and it's not super clear what they 
> represent. Once we add some tooltips it will be very clear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7347) DAG visualization: add tooltips to RDDs on job page

2015-05-06 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-7347:
-
Summary: DAG visualization: add tooltips to RDDs on job page  (was: DAG 
visualization: add hover to RDDs on job page)

> DAG visualization: add tooltips to RDDs on job page
> ---
>
> Key: SPARK-7347
> URL: https://issues.apache.org/jira/browse/SPARK-7347
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
> Attachments: tooltip.png
>
>
> Currently it's just a bunch of dots and it's not super clear what they 
> represent. Once we add some tooltips it will be very clear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7347) DAG visualization: add tooltips to RDDs on job page

2015-05-06 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-7347:
-
Attachment: tooltip.png

> DAG visualization: add tooltips to RDDs on job page
> ---
>
> Key: SPARK-7347
> URL: https://issues.apache.org/jira/browse/SPARK-7347
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
> Attachments: tooltip.png
>
>
> Currently it's just a bunch of dots and it's not super clear what they 
> represent. Once we add some tooltips it will be very clear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7347) DAG visualization: add hover to RDDs on job page

2015-05-06 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-7347:
-
Attachment: (was: job-page-hover.png)

> DAG visualization: add hover to RDDs on job page
> 
>
> Key: SPARK-7347
> URL: https://issues.apache.org/jira/browse/SPARK-7347
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
>
> Currently it's just a bunch of dots and it's not super clear what they 
> represent. Once we add some tooltips it will be very clear.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7426) spark.ml AttributeFactory.fromStructField should allow other NumericTypes

2015-05-06 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-7426:


 Summary: spark.ml AttributeFactory.fromStructField should allow 
other NumericTypes
 Key: SPARK-7426
 URL: https://issues.apache.org/jira/browse/SPARK-7426
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Joseph K. Bradley
Priority: Minor


It currently only supports DoubleType, but it should support others, at least 
for fromStructField (importing into ML attribute format, rather than exporting).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7425) spark.ml Predictor should support other numeric types for label

2015-05-06 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-7425:


 Summary: spark.ml Predictor should support other numeric types for 
label
 Key: SPARK-7425
 URL: https://issues.apache.org/jira/browse/SPARK-7425
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Joseph K. Bradley
Priority: Minor


Currently, the Predictor abstraction expects the input labelCol type to be 
DoubleType, but we should support other numeric types.  This will involve 
updating the PredictorParams.validateAndTransformSchema method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7424) spark.ml classification, regression abstractions should add metadata to output column

2015-05-06 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-7424:


 Summary: spark.ml classification, regression abstractions should 
add metadata to output column
 Key: SPARK-7424
 URL: https://issues.apache.org/jira/browse/SPARK-7424
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Joseph K. Bradley
Assignee: Joseph K. Bradley
Priority: Minor


Update ClassificationModel, ProbabilisticClassificationModel prediction to 
include numClasses in output column metadata.
Update RegressionModel to specify output column metadata as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7422) Add argmax to Vector, SparseVector

2015-05-06 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-7422:
-
Labels: starter  (was: )

> Add argmax to Vector, SparseVector
> --
>
> Key: SPARK-7422
> URL: https://issues.apache.org/jira/browse/SPARK-7422
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Joseph K. Bradley
>Priority: Minor
>  Labels: starter
>
> DenseVector has an argmax method which is currently private to Spark.  It 
> would be nice to add that method to Vector and SparseVector.  Adding it to 
> SparseVector would require being careful about handling the inactive elements 
> correctly and efficiently.
> We should make argmax public and add unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7396) Update Producer in Kafka example to use new API of Kafka 0.8.2

2015-05-06 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-7396:
-
Fix Version/s: 1.4.0

> Update Producer in Kafka example to use new API of Kafka 0.8.2
> --
>
> Key: SPARK-7396
> URL: https://issues.apache.org/jira/browse/SPARK-7396
> Project: Spark
>  Issue Type: Improvement
>  Components: Examples, Streaming
>Affects Versions: 1.4.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
> Fix For: 1.4.0
>
>
> Due to upgrade of Kafka, current KafkaWordCountProducer will throw below 
> exception, we need to update the code accordingly.
> {code}
> Exception in thread "main" kafka.common.FailedToSendMessageException: Failed 
> to send messages after 3 tries.
>   at 
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90)
>   at kafka.producer.Producer.send(Producer.scala:77)
>   at 
> org.apache.spark.examples.streaming.KafkaWordCountProducer$.main(KafkaWordCount.scala:96)
>   at 
> org.apache.spark.examples.streaming.KafkaWordCountProducer.main(KafkaWordCount.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:623)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7423) spark.ml Classifier predict should not convert vectors to dense format

2015-05-06 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-7423:


 Summary: spark.ml Classifier predict should not convert vectors to 
dense format
 Key: SPARK-7423
 URL: https://issues.apache.org/jira/browse/SPARK-7423
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Joseph K. Bradley
Priority: Minor


spark.ml.classification.ClassificationModel and 
ProbabilisticClassificationModel both use DenseVector.argmax to implement 
prediction (computing the prediction from the rawPrediction or probability 
Vectors).  It would be best to implement argmax for Vector and SparseVector and 
use Vector.argmax, rather than converting Vectors to dense format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7396) Update Producer in Kafka example to use new API of Kafka 0.8.2

2015-05-06 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-7396:
-
Issue Type: Improvement  (was: Bug)

> Update Producer in Kafka example to use new API of Kafka 0.8.2
> --
>
> Key: SPARK-7396
> URL: https://issues.apache.org/jira/browse/SPARK-7396
> Project: Spark
>  Issue Type: Improvement
>  Components: Examples, Streaming
>Affects Versions: 1.4.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
> Fix For: 1.4.0
>
>
> Due to upgrade of Kafka, current KafkaWordCountProducer will throw below 
> exception, we need to update the code accordingly.
> {code}
> Exception in thread "main" kafka.common.FailedToSendMessageException: Failed 
> to send messages after 3 tries.
>   at 
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90)
>   at kafka.producer.Producer.send(Producer.scala:77)
>   at 
> org.apache.spark.examples.streaming.KafkaWordCountProducer$.main(KafkaWordCount.scala:96)
>   at 
> org.apache.spark.examples.streaming.KafkaWordCountProducer.main(KafkaWordCount.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:623)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7405) Fix the bug that ReceiverInputDStream doesn't report InputInfo

2015-05-06 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-7405.
--
   Resolution: Fixed
Fix Version/s: 1.4.0

> Fix the bug that ReceiverInputDStream doesn't report InputInfo
> --
>
> Key: SPARK-7405
> URL: https://issues.apache.org/jira/browse/SPARK-7405
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 1.4.0
>
>
> The bug is because SPARK-7139 removed some codes from SPARK-7112 
> unintentionally here: 
> https://github.com/apache/spark/commit/1854ac326a9cc6014817d8df30ed0458eee5d7d1#diff-5c8651dd78abd20439b8eb938175075dL72



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7422) Add argmax to Vector, SparseVector

2015-05-06 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-7422:


 Summary: Add argmax to Vector, SparseVector
 Key: SPARK-7422
 URL: https://issues.apache.org/jira/browse/SPARK-7422
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Joseph K. Bradley
Priority: Minor


DenseVector has an argmax method which is currently private to Spark.  It would 
be nice to add that method to Vector and SparseVector.  Adding it to 
SparseVector would require being careful about handling the inactive elements 
correctly and efficiently.

We should make argmax public and add unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7421) Online LDA cleanups

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7421:
---

Assignee: Apache Spark  (was: Joseph K. Bradley)

> Online LDA cleanups
> ---
>
> Key: SPARK-7421
> URL: https://issues.apache.org/jira/browse/SPARK-7421
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>Priority: Minor
>
> Planned changes, primarily to allow us more flexibility in the future:
> * Rename "tau_0" to "tau0"
> * Mark LDAOptimizer trait sealed and DeveloperApi.
> * Mark LDAOptimizer subclasses as final.
> * Mark setOptimizer (the one taking an LDAOptimizer) and getOptimizer as 
> DeveloperApi since we may need to change them in the future



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7421) Online LDA cleanups

2015-05-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531819#comment-14531819
 ] 

Apache Spark commented on SPARK-7421:
-

User 'jkbradley' has created a pull request for this issue:
https://github.com/apache/spark/pull/5956

> Online LDA cleanups
> ---
>
> Key: SPARK-7421
> URL: https://issues.apache.org/jira/browse/SPARK-7421
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Minor
>
> Planned changes, primarily to allow us more flexibility in the future:
> * Rename "tau_0" to "tau0"
> * Mark LDAOptimizer trait sealed and DeveloperApi.
> * Mark LDAOptimizer subclasses as final.
> * Mark setOptimizer (the one taking an LDAOptimizer) and getOptimizer as 
> DeveloperApi since we may need to change them in the future



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7421) Online LDA cleanups

2015-05-06 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7421:
---

Assignee: Joseph K. Bradley  (was: Apache Spark)

> Online LDA cleanups
> ---
>
> Key: SPARK-7421
> URL: https://issues.apache.org/jira/browse/SPARK-7421
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Minor
>
> Planned changes, primarily to allow us more flexibility in the future:
> * Rename "tau_0" to "tau0"
> * Mark LDAOptimizer trait sealed and DeveloperApi.
> * Mark LDAOptimizer subclasses as final.
> * Mark setOptimizer (the one taking an LDAOptimizer) and getOptimizer as 
> DeveloperApi since we may need to change them in the future



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-7397) Add missing input information report back to ReceiverInputDStream due to SPARK-7139

2015-05-06 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das closed SPARK-7397.

Resolution: Duplicate

> Add missing input information report back to ReceiverInputDStream due to 
> SPARK-7139
> ---
>
> Key: SPARK-7397
> URL: https://issues.apache.org/jira/browse/SPARK-7397
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.4.0
>Reporter: Saisai Shao
>
> Input information report is missing due to refactor work of 
> ReceiverInputDStream in SPARK-7139.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7396) Update Producer in Kafka example to use new API of Kafka 0.8.2

2015-05-06 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-7396:
-
Assignee: Saisai Shao

> Update Producer in Kafka example to use new API of Kafka 0.8.2
> --
>
> Key: SPARK-7396
> URL: https://issues.apache.org/jira/browse/SPARK-7396
> Project: Spark
>  Issue Type: Bug
>  Components: Examples, Streaming
>Affects Versions: 1.4.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>
> Due to upgrade of Kafka, current KafkaWordCountProducer will throw below 
> exception, we need to update the code accordingly.
> {code}
> Exception in thread "main" kafka.common.FailedToSendMessageException: Failed 
> to send messages after 3 tries.
>   at 
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90)
>   at kafka.producer.Producer.send(Producer.scala:77)
>   at 
> org.apache.spark.examples.streaming.KafkaWordCountProducer$.main(KafkaWordCount.scala:96)
>   at 
> org.apache.spark.examples.streaming.KafkaWordCountProducer.main(KafkaWordCount.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:623)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7405) Fix the bug that ReceiverInputDStream doesn't report InputInfo

2015-05-06 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-7405:
-
Assignee: Shixiong Zhu

> Fix the bug that ReceiverInputDStream doesn't report InputInfo
> --
>
> Key: SPARK-7405
> URL: https://issues.apache.org/jira/browse/SPARK-7405
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>
> The bug is because SPARK-7139 removed some codes from SPARK-7112 
> unintentionally here: 
> https://github.com/apache/spark/commit/1854ac326a9cc6014817d8df30ed0458eee5d7d1#diff-5c8651dd78abd20439b8eb938175075dL72



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7421) Online LDA cleanups

2015-05-06 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-7421:


 Summary: Online LDA cleanups
 Key: SPARK-7421
 URL: https://issues.apache.org/jira/browse/SPARK-7421
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Joseph K. Bradley
Assignee: Joseph K. Bradley
Priority: Minor


Planned changes, primarily to allow us more flexibility in the future:
* Rename "tau_0" to "tau0"
* Mark LDAOptimizer trait sealed and DeveloperApi.
* Mark LDAOptimizer subclasses as final.
* Mark setOptimizer (the one taking an LDAOptimizer) and getOptimizer as 
DeveloperApi since we may need to change them in the future




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-7377) DAG visualization: JS error when there is only 1 RDD

2015-05-06 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-7377.

   Resolution: Fixed
Fix Version/s: 1.4.0

> DAG visualization: JS error when there is only 1 RDD
> 
>
> Key: SPARK-7377
> URL: https://issues.apache.org/jira/browse/SPARK-7377
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.4.0
>
> Attachments: viz-bug.png
>
>
> See screenshot. There is a simple fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >