date:20150511

[jira] [Created] (SPARK-7554) Throw errors when an active StreamingContext is used to create DStreams and output operations

2015-05-11 Thread Tathagata Das (JIRA)

Tathagata Das created SPARK-7554:


 Summary: Throw errors when an active StreamingContext is used to 
create DStreams and output operations
 Key: SPARK-7554
 URL: https://issues.apache.org/jira/browse/SPARK-7554
 Project: Spark
  Issue Type: Improvement
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-7026) LeftSemiJoin can not work when it has both equal condition and not equal condition.

2015-05-11 Thread Zhongshuai Pei (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongshuai Pei closed SPARK-7026.
-
Resolution: Duplicate

 LeftSemiJoin can not work when it  has both equal condition and not equal 
 condition. 
 -

 Key: SPARK-7026
 URL: https://issues.apache.org/jira/browse/SPARK-7026
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Zhongshuai Pei

 Run sql like that 
 {panel}
 select *
 from
 web_sales ws1
 left semi join
 web_sales ws2
 on ws1.ws_order_number = ws2.ws_order_number
 and ws1.ws_warehouse_sk  ws2.ws_warehouse_sk 
 {panel}
  then get an exception
 {panel}
 Couldn't find ws_warehouse_sk#287 in 
 {ws_sold_date_sk#237,ws_sold_time_sk#238,ws_ship_date_sk#239,ws_item_sk#240,ws_bill_customer_sk#241,ws_bill_cdemo_sk#242,ws_bill_hdemo_sk#243,ws_bill_addr_sk#244,ws_ship_customer_sk#245,ws_ship_cdemo_sk#246,ws_ship_hdemo_sk#247,ws_ship_addr_sk#248,ws_web_page_sk#249,ws_web_site_sk#250,ws_ship_mode_sk#251,ws_warehouse_sk#252,ws_promo_sk#253,ws_order_number#254,ws_quantity#255,ws_wholesale_cost#256,ws_list_price#257,ws_sales_price#258,ws_ext_discount_amt#259,ws_ext_sales_price#260,ws_ext_wholesale_cost#261,ws_ext_list_price#262,ws_ext_tax#263,ws_coupon_amt#264,ws_ext_ship_cost#265,ws_net_paid#266,ws_net_paid_inc_tax#267,ws_net_paid_inc_ship#268,ws_net_paid_inc_ship_tax#269,ws_net_profit#270,ws_sold_date#236}
 at scala.sys.package$.error(package.scala:27)
 {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7545) Bernoulli NaiveBayes should validate data

2015-05-11 Thread Joseph K. Bradley (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539081#comment-14539081
 ] 

Joseph K. Bradley commented on SPARK-7545:
--

OK, I appreciate it!

 Bernoulli NaiveBayes should validate data
 -

 Key: SPARK-7545
 URL: https://issues.apache.org/jira/browse/SPARK-7545
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.4.0
Reporter: Joseph K. Bradley
Assignee: Leah McGuire
Priority: Minor

 Bernoulli NaiveBayes expects input features to take values 0 or 1, but it 
 does not actually check that.  It should check and throw an exception if it 
 finds invalid values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-7331) Create HiveConf per application instead of per query in HiveQl.scala

2015-05-11 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-7331.
-
   Resolution: Fixed
Fix Version/s: 1.2.3

Issue resolved by pull request 6036
[https://github.com/apache/spark/pull/6036]

 Create HiveConf per application instead of per query in HiveQl.scala
 

 Key: SPARK-7331
 URL: https://issues.apache.org/jira/browse/SPARK-7331
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.0, 1.3.0
Reporter: Nitin Goyal
Priority: Minor
 Fix For: 1.2.3


 A new HiveConf is created per query in getAst method in HiveQl.scala
   def getAst(sql: String): ASTNode = {
 /*
  * Context has to be passed in hive0.13.1.
  * Otherwise, there will be Null pointer exception,
  * when retrieving properties form HiveConf.
  */
 val hContext = new Context(new HiveConf())
 val node = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql, 
 hContext))
 hContext.clear()
 node
   }
 Creating hiveConf adds a minimum of 90ms delay per query. So moving its 
 creation in Object such that it gets created once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-7538) Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-11 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-7538.

Resolution: Fixed

This was a cross post from the mailing list. The poster closed the thread with 
the following:

{code}
Ted, many thanks.  I'm not used to Java dependencies so this was a real 
head-scratcher for me.

Downloading the two metrics packages from the maven repository (metrics-core, 
metrics-annotation) and supplying it on the spark-submit command line worked.

My final spark-submit for a python project using Kafka as an input source:

/home/ubuntu/spark/spark-1.3.1/bin/spark-submit \
--packages 
TargetHolding/pyspark-cassandra:0.1.4,org.apache.spark:spark-streaming-kafka_2.10:1.3.1
 \
--jars 
/home/ubuntu/jars/metrics-core-2.2.0.jar,/home/ubuntu/jars/metrics-annotation-2.2.0.jar
 \
--conf 
spark.cassandra.connection.host=10.10.103.172,10.10.102.160,10.10.101.79 \
--master spark://127.0.0.1:7077 \
affected_hosts.py

Now we're seeing data from the stream.  Thanks again!
{code}

 Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge
 ---

 Key: SPARK-7538
 URL: https://issues.apache.org/jira/browse/SPARK-7538
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.3.1
 Environment: Ubuntu 14.04 LTS
 java version 1.7.0_79
 OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
 OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)
 Spark 1.3.1 release.
Reporter: Lee McFadden

 We have a simple streaming job, the components of which work fine in a batch 
 environment reading from a cassandra table as the source.
 We adapted it to work with streaming using the Python libs.
 Submit command line:
 {code}
 /home/ubuntu/spark/spark-1.3.1/bin/spark-submit \
 --packages 
 TargetHolding/pyspark-cassandra:0.1.4,org.apache.spark:spark-streaming-kafka_2.10:1.3.1
  \
 --conf 
 spark.cassandra.connection.host=10.10.103.172,10.10.102.160,10.10.101.79 \
 --master spark://127.0.0.1:7077 \
 affected_hosts.py
 {code}
 When we run the streaming job everything starts just fine, then we see the 
 following in the logs:
 {code}
 15/05/11 19:50:46 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 70, 
 ip-10-10-102-53.us-west-2.compute.internal): java.lang.NoClassDefFoundError: 
 com/yammer/metrics/core/Gauge
 at 
 kafka.consumer.ZookeeperConsumerConnector.createFetcher(ZookeeperConsumerConnector.scala:151)
 at 
 kafka.consumer.ZookeeperConsumerConnector.init(ZookeeperConsumerConnector.scala:115)
 at 
 kafka.consumer.ZookeeperConsumerConnector.init(ZookeeperConsumerConnector.scala:128)
 at kafka.consumer.Consumer$.create(ConsumerConnector.scala:89)
 at 
 org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100)
 at 
 org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:121)
 at 
 org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:106)
 at 
 org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$8.apply(ReceiverTracker.scala:298)
 at 
 org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$8.apply(ReceiverTracker.scala:290)
 at 
 org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
 at 
 org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 at org.apache.spark.scheduler.Task.run(Task.scala:64)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.ClassNotFoundException: com.yammer.metrics.core.Gauge
 at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
 ... 17 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-7520) Install Jekyll On Jenkins Machines

2015-05-11 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-7520.

   Resolution: Fixed
Fix Version/s: 1.4.0

All green - awesome thanks [~shaneknapp]!

 Install Jekyll On Jenkins Machines
 --

 Key: SPARK-7520
 URL: https://issues.apache.org/jira/browse/SPARK-7520
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Reporter: Patrick Wendell
Assignee: shane knapp
Priority: Critical
 Fix For: 1.4.0


 Hey Shane,
 SPARK-1517 requires us to install Jekyll on the build machines. Any chance we 
 can do that?
 http://jekyllrb.com/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6876) DataFrame.na.replace value support for Python

2015-05-11 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-6876:
--
Assignee: Adrian Wang

 DataFrame.na.replace value support for Python
 -

 Key: SPARK-6876
 URL: https://issues.apache.org/jira/browse/SPARK-6876
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Adrian Wang

 Scala/Java support is in. We should provide the Python version, similar to 
 what Pandas supports.
 http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.replace.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-7411) CTAS parser is incomplete

2015-05-11 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-7411.
-
   Resolution: Fixed
Fix Version/s: 1.4.0

Issue resolved by pull request 5963
[https://github.com/apache/spark/pull/5963]

 CTAS parser is incomplete
 -

 Key: SPARK-7411
 URL: https://issues.apache.org/jira/browse/SPARK-7411
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Michael Armbrust
Assignee: Cheng Hao
Priority: Blocker
 Fix For: 1.4.0


 The change to use an isolated classloader removed the use of the Semantic 
 Analyzer for parsing CTAS queries.  We should fix this before the release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7553) Add methods to maintain a singleton StreamingContext

2015-05-11 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-7553:
-
Description: 
In a REPL/notebook environment, its very easy to lose a reference to a 
StreamingContext by overriding the variable name. So if you happen to execute 
the following commands

{{
val ssc = new StreamingContext(...)  // cmd 1
ssc.start() // cmd 2
...
val ssc = new StreamingContext(...)   // accidentally run cmd 1 again
}}

The value of ssc will be overwritten. Now you can neither start the new context 
(as only one context can be started), nor stop the previous context (as the 
reference is lost).

Hence its best to maintain a singleton reference to the active context, so that 
we never loose reference for the active context. 



  was:
In a REPL/notebook environment, its very easy to lose a reference to a 
StreamingContext by overriding the variable name. So if you happen to execute 
the following commands
{{
val ssc = new StreamingContext(...)  // cmd 1
ssc.start() // cmd 2
...
val ssc = new StreamingContext(...)   // accidentally run cmd 1 again
}}

The value of ssc will be overwritten. Now you can neither start the new context 
(as only one context can be started), nor stop the previous context (as the 
reference is lost).

Hence its best to maintain a singleton reference to the active context, so that 
we never loose reference for the active context. 




 Add methods to maintain a singleton StreamingContext 
 -

 Key: SPARK-7553
 URL: https://issues.apache.org/jira/browse/SPARK-7553
 Project: Spark
  Issue Type: New Feature
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Blocker

 In a REPL/notebook environment, its very easy to lose a reference to a 
 StreamingContext by overriding the variable name. So if you happen to execute 
 the following commands
 {{
 val ssc = new StreamingContext(...)  // cmd 1
 ssc.start() // cmd 2
 ...
 val ssc = new StreamingContext(...)   // accidentally run cmd 1 again
 }}
 The value of ssc will be overwritten. Now you can neither start the new 
 context (as only one context can be started), nor stop the previous context 
 (as the reference is lost).
 Hence its best to maintain a singleton reference to the active context, so 
 that we never loose reference for the active context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-7553) Add methods to maintain a singleton StreamingContext

2015-05-11 Thread Tathagata Das (JIRA)

Tathagata Das created SPARK-7553:


 Summary: Add methods to maintain a singleton StreamingContext 
 Key: SPARK-7553
 URL: https://issues.apache.org/jira/browse/SPARK-7553
 Project: Spark
  Issue Type: New Feature
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Blocker


In a REPL/notebook environment, its very easy to lose a reference to a 
StreamingContext by overriding the variable name. So if you happen to execute 
the following commands
{{
val ssc = new StreamingContext(...)  // cmd 1
ssc.start() // cmd 2
...
val ssc = new StreamingContext(...)   // accidentally run cmd 1 again
}}

The value of ssc will be overwritten. Now you can neither start the new context 
(as only one context can be started), nor stop the previous context (as the 
reference is lost).

Hence its best to maintain a singleton reference to the active context, so that 
we never loose reference for the active context. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-7552) Close files correctly when iteration is finished in WAL recovery

2015-05-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7552:
---

Assignee: Apache Spark

 Close files correctly when iteration is finished in WAL recovery
 

 Key: SPARK-7552
 URL: https://issues.apache.org/jira/browse/SPARK-7552
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.3.1, 1.4.0
Reporter: Saisai Shao
Assignee: Apache Spark
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7553) Add methods to maintain a singleton StreamingContext

2015-05-11 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-7553:
-
Description: 
In a REPL/notebook environment, its very easy to lose a reference to a 
StreamingContext by overriding the variable name. So if you happen to execute 
the following commands

{{
val ssc = new StreamingContext(...)  // cmd 1
ssc.start() // cmd 2
...
val ssc = new StreamingContext(...)   // accidentally run cmd 1 again
}}

The value of ssc will be overwritten. Now you can neither start the new context 
(as only one context can be started), nor stop the previous context (as the 
reference is lost).

Hence its best to maintain a singleton reference to the active context, so that 
we never loose reference for the active context. 

Since this is useful in REPL environments, its best to add this as an 
Experimental support in the Scala API only.



  was:
In a REPL/notebook environment, its very easy to lose a reference to a 
StreamingContext by overriding the variable name. So if you happen to execute 
the following commands

{{
val ssc = new StreamingContext(...)  // cmd 1
ssc.start() // cmd 2
...
val ssc = new StreamingContext(...)   // accidentally run cmd 1 again
}}

The value of ssc will be overwritten. Now you can neither start the new context 
(as only one context can be started), nor stop the previous context (as the 
reference is lost).

Hence its best to maintain a singleton reference to the active context, so that 
we never loose reference for the active context. 




 Add methods to maintain a singleton StreamingContext 
 -

 Key: SPARK-7553
 URL: https://issues.apache.org/jira/browse/SPARK-7553
 Project: Spark
  Issue Type: New Feature
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Blocker

 In a REPL/notebook environment, its very easy to lose a reference to a 
 StreamingContext by overriding the variable name. So if you happen to execute 
 the following commands
 {{
 val ssc = new StreamingContext(...)  // cmd 1
 ssc.start() // cmd 2
 ...
 val ssc = new StreamingContext(...)   // accidentally run cmd 1 again
 }}
 The value of ssc will be overwritten. Now you can neither start the new 
 context (as only one context can be started), nor stop the previous context 
 (as the reference is lost).
 Hence its best to maintain a singleton reference to the active context, so 
 that we never loose reference for the active context. 
 Since this is useful in REPL environments, its best to add this as an 
 Experimental support in the Scala API only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7553) Add methods to maintain a singleton StreamingContext

2015-05-11 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-7553:
-
Description: 
In a REPL/notebook environment, its very easy to lose a reference to a 
StreamingContext by overriding the variable name. So if you happen to execute 
the following commands

{{
val ssc = new StreamingContext(...)  // cmd 1
ssc.start() // cmd 2
...
val ssc = new StreamingContext(...)   // accidentally run cmd 1 again
}}

The value of ssc will be overwritten. Now you can neither start the new context 
(as only one context can be started), nor stop the previous context (as the 
reference is lost).

Hence its best to maintain a singleton reference to the active context, so that 
we never loose reference for the active context. 

Since this problem occurs useful in REPL environments, its best to add this as 
an Experimental support in the Scala API only so that it can be used in Scala 
REPLs and notebooks.



  was:
In a REPL/notebook environment, its very easy to lose a reference to a 
StreamingContext by overriding the variable name. So if you happen to execute 
the following commands

{{
val ssc = new StreamingContext(...)  // cmd 1
ssc.start() // cmd 2
...
val ssc = new StreamingContext(...)   // accidentally run cmd 1 again
}}

The value of ssc will be overwritten. Now you can neither start the new context 
(as only one context can be started), nor stop the previous context (as the 
reference is lost).

Hence its best to maintain a singleton reference to the active context, so that 
we never loose reference for the active context. 

Since this is useful in REPL environments, its best to add this as an 
Experimental support in the Scala API only.




 Add methods to maintain a singleton StreamingContext 
 -

 Key: SPARK-7553
 URL: https://issues.apache.org/jira/browse/SPARK-7553
 Project: Spark
  Issue Type: New Feature
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Blocker

 In a REPL/notebook environment, its very easy to lose a reference to a 
 StreamingContext by overriding the variable name. So if you happen to execute 
 the following commands
 {{
 val ssc = new StreamingContext(...)  // cmd 1
 ssc.start() // cmd 2
 ...
 val ssc = new StreamingContext(...)   // accidentally run cmd 1 again
 }}
 The value of ssc will be overwritten. Now you can neither start the new 
 context (as only one context can be started), nor stop the previous context 
 (as the reference is lost).
 Hence its best to maintain a singleton reference to the active context, so 
 that we never loose reference for the active context. 
 Since this problem occurs useful in REPL environments, its best to add this 
 as an Experimental support in the Scala API only so that it can be used in 
 Scala REPLs and notebooks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7552) Close files correctly when iteration is finished in WAL recovery

2015-05-11 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-7552:
---
Labels: backport-needed  (was: )

 Close files correctly when iteration is finished in WAL recovery
 

 Key: SPARK-7552
 URL: https://issues.apache.org/jira/browse/SPARK-7552
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.3.1, 1.4.0
Reporter: Saisai Shao
  Labels: backport-needed
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-7553) Add methods to maintain a singleton StreamingContext

2015-05-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7553:
---

Assignee: Tathagata Das  (was: Apache Spark)

 Add methods to maintain a singleton StreamingContext 
 -

 Key: SPARK-7553
 URL: https://issues.apache.org/jira/browse/SPARK-7553
 Project: Spark
  Issue Type: New Feature
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Blocker

 In a REPL/notebook environment, its very easy to lose a reference to a 
 StreamingContext by overriding the variable name. So if you happen to execute 
 the following commands
 {{
 val ssc = new StreamingContext(...)  // cmd 1
 ssc.start() // cmd 2
 ...
 val ssc = new StreamingContext(...)   // accidentally run cmd 1 again
 }}
 The value of ssc will be overwritten. Now you can neither start the new 
 context (as only one context can be started), nor stop the previous context 
 (as the reference is lost).
 Hence its best to maintain a singleton reference to the active context, so 
 that we never loose reference for the active context. 
 Since this problem occurs useful in REPL environments, its best to add this 
 as an Experimental support in the Scala API only so that it can be used in 
 Scala REPLs and notebooks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7553) Add methods to maintain a singleton StreamingContext

2015-05-11 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539140#comment-14539140
 ] 

Apache Spark commented on SPARK-7553:
-

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/6070

 Add methods to maintain a singleton StreamingContext 
 -

 Key: SPARK-7553
 URL: https://issues.apache.org/jira/browse/SPARK-7553
 Project: Spark
  Issue Type: New Feature
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Blocker

 In a REPL/notebook environment, its very easy to lose a reference to a 
 StreamingContext by overriding the variable name. So if you happen to execute 
 the following commands
 {{
 val ssc = new StreamingContext(...)  // cmd 1
 ssc.start() // cmd 2
 ...
 val ssc = new StreamingContext(...)   // accidentally run cmd 1 again
 }}
 The value of ssc will be overwritten. Now you can neither start the new 
 context (as only one context can be started), nor stop the previous context 
 (as the reference is lost).
 Hence its best to maintain a singleton reference to the active context, so 
 that we never loose reference for the active context. 
 Since this problem occurs useful in REPL environments, its best to add this 
 as an Experimental support in the Scala API only so that it can be used in 
 Scala REPLs and notebooks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-7553) Add methods to maintain a singleton StreamingContext

2015-05-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7553:
---

Assignee: Apache Spark  (was: Tathagata Das)

 Add methods to maintain a singleton StreamingContext 
 -

 Key: SPARK-7553
 URL: https://issues.apache.org/jira/browse/SPARK-7553
 Project: Spark
  Issue Type: New Feature
  Components: Streaming
Reporter: Tathagata Das
Assignee: Apache Spark
Priority: Blocker

 In a REPL/notebook environment, its very easy to lose a reference to a 
 StreamingContext by overriding the variable name. So if you happen to execute 
 the following commands
 {{
 val ssc = new StreamingContext(...)  // cmd 1
 ssc.start() // cmd 2
 ...
 val ssc = new StreamingContext(...)   // accidentally run cmd 1 again
 }}
 The value of ssc will be overwritten. Now you can neither start the new 
 context (as only one context can be started), nor stop the previous context 
 (as the reference is lost).
 Hence its best to maintain a singleton reference to the active context, so 
 that we never loose reference for the active context. 
 Since this problem occurs useful in REPL environments, its best to add this 
 as an Experimental support in the Scala API only so that it can be used in 
 Scala REPLs and notebooks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7554) Throw exception when an active StreamingContext is used to create DStreams and output operations

2015-05-11 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-7554:
-
Priority: Blocker  (was: Critical)

 Throw exception when an active StreamingContext is used to create DStreams 
 and output operations
 

 Key: SPARK-7554
 URL: https://issues.apache.org/jira/browse/SPARK-7554
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Blocker





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7554) Throw errors when an active StreamingContext is used to create DStreams and output operations

2015-05-11 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-7554:
-
 Component/s: Streaming
Target Version/s: 1.4.0

 Throw errors when an active StreamingContext is used to create DStreams and 
 output operations
 -

 Key: SPARK-7554
 URL: https://issues.apache.org/jira/browse/SPARK-7554
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7554) Throw exception when an active StreamingContext is used to create DStreams and output operations

2015-05-11 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-7554:
-
Summary: Throw exception when an active StreamingContext is used to create 
DStreams and output operations  (was: Throw errors when an active 
StreamingContext is used to create DStreams and output operations)

 Throw exception when an active StreamingContext is used to create DStreams 
 and output operations
 

 Key: SPARK-7554
 URL: https://issues.apache.org/jira/browse/SPARK-7554
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7554) Throw exception when an active StreamingContext is used to create DStreams and output operations

2015-05-11 Thread Tathagata Das (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539143#comment-14539143
 ] 

Tathagata Das commented on SPARK-7554:
--

Currently, adding DStreams to an active context is not supported, but the 
errors are ambiguous. Make this error more explicit.

 Throw exception when an active StreamingContext is used to create DStreams 
 and output operations
 

 Key: SPARK-7554
 URL: https://issues.apache.org/jira/browse/SPARK-7554
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-7509) Add drop column to Python DataFrame API

2015-05-11 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-7509.

   Resolution: Fixed
Fix Version/s: 1.4.0

 Add drop column to Python DataFrame API
 ---

 Key: SPARK-7509
 URL: https://issues.apache.org/jira/browse/SPARK-7509
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-6198) Support select current_database()

2015-05-11 Thread Zhongshuai Pei (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongshuai Pei closed SPARK-6198.
-
Resolution: Won't Fix

 Support select current_database()
 ---

 Key: SPARK-6198
 URL: https://issues.apache.org/jira/browse/SPARK-6198
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.1
Reporter: Zhongshuai Pei

 The method(evaluate) has changed in UDFCurrentDB, it just throws a 
 exception.But hiveUdfs call this method and failed.
 @Override
   public Object evaluate(DeferredObject[] arguments) throws HiveException {
 throw new IllegalStateException(never);
   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-5129) make SqlContext support select date +/- XX DAYS from table

2015-05-11 Thread Zhongshuai Pei (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongshuai Pei closed SPARK-5129.
-
Resolution: Won't Fix

 make SqlContext support select date +/- XX DAYS from table  
 --

 Key: SPARK-5129
 URL: https://issues.apache.org/jira/browse/SPARK-5129
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Zhongshuai Pei
Priority: Minor

 Example :
 create table test (date: Date)
 2014-01-01
 2014-01-02
 2014-01-03
 when  running select date + 10 DAYS from test, i want get
 2014-01-11 
 2014-01-12
 2014-01-13
 and running select date - 10 DAYS from test,  get
 2013-12-22
 2013-12-23
 2013-12-24



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-6768) Do not support float/double union decimal or decimal(a ,b) union decimal(c, d)

2015-05-11 Thread Zhongshuai Pei (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongshuai Pei closed SPARK-6768.
-
Resolution: Fixed

 Do not support float/double union decimal or decimal(a ,b) union decimal(c, 
 d)
 

 Key: SPARK-6768
 URL: https://issues.apache.org/jira/browse/SPARK-6768
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Zhongshuai Pei

 Do not support sql like that :
 select cast(12.2056999 as float) from testData limit 1
 union
 select cast(12.2041 as decimal(7, 4)) from testData limit 1
 select cast(12.2056999 as double) from testData limit 1
 union
 select cast(12.2041 as decimal(7, 4)) from testData limit 1
 select cast(1241.20 as decimal(6, 2)) from testData limit 1
 union
 select cast(1.204 as decimal(5, 3)) from testData limit 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-7530) Add API to get the current state of a StreamingContext

2015-05-11 Thread Tathagata Das (JIRA)

Tathagata Das created SPARK-7530:


 Summary: Add API to get the current state of a StreamingContext
 Key: SPARK-7530
 URL: https://issues.apache.org/jira/browse/SPARK-7530
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Blocker






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7531) Install GPG on Jenkins machines

2015-05-11 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538154#comment-14538154
 ] 

shane knapp commented on SPARK-7531:


is this version sufficient?

-bash-4.1$ gpg --version
gpg (GnuPG) 2.0.14
libgcrypt 1.4.5
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Home: ~/.gnupg
Supported algorithms:
Pubkey: RSA, ELG, DSA
Cipher: 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH, CAMELLIA128,
CAMELLIA192, CAMELLIA256
Hash: MD5, SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224
Compression: Uncompressed, ZIP, ZLIB, BZIP2

 Install GPG on Jenkins machines
 ---

 Key: SPARK-7531
 URL: https://issues.apache.org/jira/browse/SPARK-7531
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Reporter: Patrick Wendell
Assignee: shane knapp

 This one is also required for us to cut regular snapshot releases from 
 Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7532) Make StreamingContext.start() idempotent

2015-05-11 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-7532:
-
Component/s: Streaming

 Make StreamingContext.start() idempotent
 

 Key: SPARK-7532
 URL: https://issues.apache.org/jira/browse/SPARK-7532
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Blocker

 Currently calling StreamingContext.start() throws error when the context is 
 already started. This is inconsistent with the the StreamingContext.stop() 
 which is idempotent, that is, called stop() on a stopped context is a no-op. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7458) Check 1.3- 1.4 MLlib API compliance using java-compliance-checker

2015-05-11 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-7458:
-
Description: We should do this after 1.4-rc1 is cut.

 Check 1.3- 1.4 MLlib API compliance using java-compliance-checker
 --

 Key: SPARK-7458
 URL: https://issues.apache.org/jira/browse/SPARK-7458
 Project: Spark
  Issue Type: Sub-task
  Components: ML, MLlib
Affects Versions: 1.4.0
Reporter: Xiangrui Meng

 We should do this after 1.4-rc1 is cut.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-7355) FlakyTest - o.a.s.DriverSuite

2015-05-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7355:
---

Assignee: Andrew Or  (was: Apache Spark)

 FlakyTest - o.a.s.DriverSuite
 -

 Key: SPARK-7355
 URL: https://issues.apache.org/jira/browse/SPARK-7355
 Project: Spark
  Issue Type: Test
  Components: Spark Core, Tests
Reporter: Tathagata Das
Assignee: Andrew Or
Priority: Blocker
  Labels: flaky-test





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7355) FlakyTest - o.a.s.DriverSuite

2015-05-11 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538185#comment-14538185
 ] 

Apache Spark commented on SPARK-7355:
-

User 'tedyu' has created a pull request for this issue:
https://github.com/apache/spark/pull/6059

 FlakyTest - o.a.s.DriverSuite
 -

 Key: SPARK-7355
 URL: https://issues.apache.org/jira/browse/SPARK-7355
 Project: Spark
  Issue Type: Test
  Components: Spark Core, Tests
Reporter: Tathagata Das
Assignee: Andrew Or
Priority: Blocker
  Labels: flaky-test





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-7355) FlakyTest - o.a.s.DriverSuite

2015-05-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7355:
---

Assignee: Apache Spark  (was: Andrew Or)

 FlakyTest - o.a.s.DriverSuite
 -

 Key: SPARK-7355
 URL: https://issues.apache.org/jira/browse/SPARK-7355
 Project: Spark
  Issue Type: Test
  Components: Spark Core, Tests
Reporter: Tathagata Das
Assignee: Apache Spark
Priority: Blocker
  Labels: flaky-test





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-7532) Make StreamingContext.start() idempotent

2015-05-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7532:
---

Assignee: Tathagata Das  (was: Apache Spark)

 Make StreamingContext.start() idempotent
 

 Key: SPARK-7532
 URL: https://issues.apache.org/jira/browse/SPARK-7532
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Blocker

 Currently calling StreamingContext.start() throws error when the context is 
 already started. This is inconsistent with the the StreamingContext.stop() 
 which is idempotent, that is, called stop() on a stopped context is a no-op. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7532) Make StreamingContext.start() idempotent

2015-05-11 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538193#comment-14538193
 ] 

Apache Spark commented on SPARK-7532:
-

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/6060

 Make StreamingContext.start() idempotent
 

 Key: SPARK-7532
 URL: https://issues.apache.org/jira/browse/SPARK-7532
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Blocker

 Currently calling StreamingContext.start() throws error when the context is 
 already started. This is inconsistent with the the StreamingContext.stop() 
 which is idempotent, that is, called stop() on a stopped context is a no-op. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-7532) Make StreamingContext.start() idempotent

2015-05-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7532:
---

Assignee: Apache Spark  (was: Tathagata Das)

 Make StreamingContext.start() idempotent
 

 Key: SPARK-7532
 URL: https://issues.apache.org/jira/browse/SPARK-7532
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Tathagata Das
Assignee: Apache Spark
Priority: Blocker

 Currently calling StreamingContext.start() throws error when the context is 
 already started. This is inconsistent with the the StreamingContext.stop() 
 which is idempotent, that is, called stop() on a stopped context is a no-op. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7133) Implement struct, array, and map field accessor using apply in Scala and getitem in Python

2015-05-11 Thread Nicholas Chammas (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538272#comment-14538272
 ] 

Nicholas Chammas commented on SPARK-7133:
-

[~rxin] - Should we also implement {{\_\_getitem\_\_}} access in PySpark for 
{{Row}}? Or does this patch also cover that?

As of Spark 1.3.1, you can do {{row.field}} but not {{row\['field'\]}}.

 Implement struct, array, and map field accessor using apply in Scala and 
 __getitem__ in Python
 --

 Key: SPARK-7133
 URL: https://issues.apache.org/jira/browse/SPARK-7133
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Wenchen Fan
Priority: Blocker
 Fix For: 1.4.0


 Typing 
 {code}
 df.col[1]
 {code}
 and
 {code}
 df.col['field']
 {code}
 is so much eaiser than
 {code}
 df.col.getField('field')
 df.col.getItem(1)
 {code}
 This would require us to define (in Column) an apply function in Scala, and a 
 __getitem__ function in Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-7361) Throw unambiguous exception when attempting to start multiple StreamingContexts in the same JVM

2015-05-11 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-7361.
--
   Resolution: Fixed
Fix Version/s: 1.4.0

 Throw unambiguous exception when attempting to start multiple 
 StreamingContexts in the same JVM
 ---

 Key: SPARK-7361
 URL: https://issues.apache.org/jira/browse/SPARK-7361
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Blocker
 Fix For: 1.4.0


 Currently attempt to start a streamingContext while another one is started 
 throws a confusing exception that the action name JobScheduler is already 
 registered. Instead its best to throw a proper exception as it is not 
 supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7497) test_count_by_value_and_window is flaky

2015-05-11 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-7497:
-
Assignee: Davies Liu  (was: Tathagata Das)

 test_count_by_value_and_window is flaky
 ---

 Key: SPARK-7497
 URL: https://issues.apache.org/jira/browse/SPARK-7497
 Project: Spark
  Issue Type: Bug
  Components: PySpark, Streaming
Affects Versions: 1.4.0
Reporter: Xiangrui Meng
Assignee: Davies Liu
  Labels: flaky-test

 Saw this test failure in 
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32268/console
 {code}
 ==
 FAIL: test_count_by_value_and_window (__main__.WindowFunctionTests)
 --
 Traceback (most recent call last):
   File pyspark/streaming/tests.py, line 418, in 
 test_count_by_value_and_window
 self._test_func(input, func, expected)
   File pyspark/streaming/tests.py, line 133, in _test_func
 self.assertEqual(expected, result)
 AssertionError: Lists differ: [[1], [2], [3], [4], [5], [6], [6], [6], [6], 
 [6]] != [[1], [2], [3], [4], [5], [6], [6], [6]]
 First list contains 2 additional elements.
 First extra element 8:
 [6]
 - [[1], [2], [3], [4], [5], [6], [6], [6], [6], [6]]
 ? --
 + [[1], [2], [3], [4], [5], [6], [6], [6]]
 --
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-7522) ML Examples option for dataFormat should not be enclosed in angle brackets

2015-05-11 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-7522.
--
   Resolution: Fixed
Fix Version/s: 1.4.0

Issue resolved by pull request 6049
[https://github.com/apache/spark/pull/6049]

 ML Examples option for dataFormat should not be enclosed in angle brackets
 --

 Key: SPARK-7522
 URL: https://issues.apache.org/jira/browse/SPARK-7522
 Project: Spark
  Issue Type: Bug
  Components: Examples
Reporter: Bryan Cutler
Priority: Minor
 Fix For: 1.4.0


 Some ML examples include an option for specifying the data format, such as 
 DecisionTreeExample, but the option is enclosed in angle brackets like 
 opt[String](dataFormat).  This is probably just a typo but makes it 
 awkward to use the option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-7534) Fix the Stage table when a stage is missing

2015-05-11 Thread Shixiong Zhu (JIRA)

Shixiong Zhu created SPARK-7534:
---

 Summary: Fix the Stage table when a stage is missing
 Key: SPARK-7534
 URL: https://issues.apache.org/jira/browse/SPARK-7534
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Web UI
Reporter: Shixiong Zhu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7400) PortableDataStream UDT

2015-05-11 Thread Eron Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538236#comment-14538236
 ] 

Eron Wright  commented on SPARK-7400:
-

Given the above, my proposal is to modify 
[org.apache.spark.sql.catalyst.ScalaReflection::schemaFor|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L74]
 to return an instance of org.apache.spark.sql.types.PortableDataStreamUDT.

 PortableDataStream UDT
 --

 Key: SPARK-7400
 URL: https://issues.apache.org/jira/browse/SPARK-7400
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Eron Wright 

 Improve support for PortableDataStream in a DataFrame by implementing 
 PortableDataStreamUDT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7443) MLlib 1.4 QA plan

2015-05-11 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-7443:
-
Description: 
TODO: create JIRAs for each task and assign them accordingly.

h2. API

* Check API compliance using java-compliance-checker (SPARK-7458)

* Audit new public APIs (from the generated html doc)
** Scala (do not forget to check the object doc)
** Java compatibility (SPARK-7529)
** Python API coverage

* audit Pipeline APIs
** feature transformers
** tree models
** elastic-net
** ML attributes
** developer APIs

* graduate spark.ml from alpha
** remove AlphaComponent annotations
** remove mima excludes for spark.ml

h2. Algorithms and performance

* list missing performance tests from spark-perf
* LDA online/EM (SPARK-7455)
* ElasticNet for linear regression and logistic regression (SPARK-7456)
* Bernoulli naive Bayes (SPARK-7453)
* PIC (SPARK-7454)
* ALS.recommendAll (SPARK-7457)
* perf-tests in Python

correctness:
* PMML
** scoring using PMML evaluator vs. MLlib models
* save/load

h2. Documentation and example code

* create JIRAs for the user guide to each new algorithm and assign them to the 
corresponding author

* create example code for major components
** cross validation in python
** pipeline with complex feature transformations (scala/java/python)
** elastic-net (possibly with cross validation)

  was:
TODO: create JIRAs for each task and assign them accordingly.

h2. API

* Check API compliance using java-compliance-checker (SPARK-7458)

* Audit new public APIs (from the generated html doc)
** Scala (do not forget to check the object doc)
** Java compatibility
** Python API coverage

* audit Pipeline APIs
** feature transformers
** tree models
** elastic-net
** ML attributes
** developer APIs

* graduate spark.ml from alpha
** remove AlphaComponent annotations
** remove mima excludes for spark.ml

h2. Algorithms and performance

* list missing performance tests from spark-perf
* LDA online/EM (SPARK-7455)
* ElasticNet for linear regression and logistic regression (SPARK-7456)
* Bernoulli naive Bayes (SPARK-7453)
* PIC (SPARK-7454)
* ALS.recommendAll (SPARK-7457)
* perf-tests in Python

correctness:
* PMML
** scoring using PMML evaluator vs. MLlib models
* save/load

h2. Documentation and example code

* create JIRAs for the user guide to each new algorithm and assign them to the 
corresponding author

* create example code for major components
** cross validation in python
** pipeline with complex feature transformations (scala/java/python)
** elastic-net (possibly with cross validation)


 MLlib 1.4 QA plan
 -

 Key: SPARK-7443
 URL: https://issues.apache.org/jira/browse/SPARK-7443
 Project: Spark
  Issue Type: Umbrella
  Components: ML, MLlib
Affects Versions: 1.4.0
Reporter: Xiangrui Meng
Assignee: Joseph K. Bradley
Priority: Critical

 TODO: create JIRAs for each task and assign them accordingly.
 h2. API
 * Check API compliance using java-compliance-checker (SPARK-7458)
 * Audit new public APIs (from the generated html doc)
 ** Scala (do not forget to check the object doc)
 ** Java compatibility (SPARK-7529)
 ** Python API coverage
 * audit Pipeline APIs
 ** feature transformers
 ** tree models
 ** elastic-net
 ** ML attributes
 ** developer APIs
 * graduate spark.ml from alpha
 ** remove AlphaComponent annotations
 ** remove mima excludes for spark.ml
 h2. Algorithms and performance
 * list missing performance tests from spark-perf
 * LDA online/EM (SPARK-7455)
 * ElasticNet for linear regression and logistic regression (SPARK-7456)
 * Bernoulli naive Bayes (SPARK-7453)
 * PIC (SPARK-7454)
 * ALS.recommendAll (SPARK-7457)
 * perf-tests in Python
 correctness:
 * PMML
 ** scoring using PMML evaluator vs. MLlib models
 * save/load
 h2. Documentation and example code
 * create JIRAs for the user guide to each new algorithm and assign them to 
 the corresponding author
 * create example code for major components
 ** cross validation in python
 ** pipeline with complex feature transformations (scala/java/python)
 ** elastic-net (possibly with cross validation)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-6092) Add RankingMetrics in PySpark/MLlib

2015-05-11 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-6092.
--
   Resolution: Fixed
Fix Version/s: 1.4.0

Issue resolved by pull request 6044
[https://github.com/apache/spark/pull/6044]

 Add RankingMetrics in PySpark/MLlib
 ---

 Key: SPARK-6092
 URL: https://issues.apache.org/jira/browse/SPARK-6092
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib, PySpark
Reporter: Xiangrui Meng
Assignee: Yanbo Liang
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-7529) Java compatibility check for MLlib 1.4

2015-05-11 Thread Xiangrui Meng (JIRA)

Xiangrui Meng created SPARK-7529:


 Summary: Java compatibility check for MLlib 1.4
 Key: SPARK-7529
 URL: https://issues.apache.org/jira/browse/SPARK-7529
 Project: Spark
  Issue Type: Sub-task
  Components: ML, MLlib
Affects Versions: 1.4.0
Reporter: Xiangrui Meng


Check Java compatibility for MLlib 1.4. We should create separate JIRAs for 
each possible issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-7530) Add API to get the current state of a StreamingContext

2015-05-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7530:
---

Assignee: Tathagata Das  (was: Apache Spark)

 Add API to get the current state of a StreamingContext
 --

 Key: SPARK-7530
 URL: https://issues.apache.org/jira/browse/SPARK-7530
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Blocker





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7522) ML Examples option for dataFormat should not be enclosed in angle brackets

2015-05-11 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-7522:
-
 Target Version/s: 1.1.2, 1.2.3, 1.3.2
Affects Version/s: 1.4.0
   1.1.1
   1.2.2
   1.3.1

 ML Examples option for dataFormat should not be enclosed in angle brackets
 --

 Key: SPARK-7522
 URL: https://issues.apache.org/jira/browse/SPARK-7522
 Project: Spark
  Issue Type: Bug
  Components: Examples
Affects Versions: 1.1.1, 1.2.2, 1.3.1, 1.4.0
Reporter: Bryan Cutler
Assignee: Bryan Cutler
Priority: Minor
 Fix For: 1.4.0


 Some ML examples include an option for specifying the data format, such as 
 DecisionTreeExample, but the option is enclosed in angle brackets like 
 opt[String](dataFormat).  This is probably just a typo but makes it 
 awkward to use the option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7520) Install Jekyll On Jenkins Machines

2015-05-11 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538148#comment-14538148
 ] 

shane knapp commented on SPARK-7520:


any particular version of ruby?  1.8.7?

 Install Jekyll On Jenkins Machines
 --

 Key: SPARK-7520
 URL: https://issues.apache.org/jira/browse/SPARK-7520
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Reporter: Patrick Wendell
Assignee: shane knapp
Priority: Critical

 Hey Shane,
 SPARK-1517 requires us to install Jekyll on the build machines. Any chance we 
 can do that?
 http://jekyllrb.com/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7504) NullPointerException when initializing SparkContext in YARN-cluster mode

2015-05-11 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538159#comment-14538159
 ] 

Marcelo Vanzin commented on SPARK-7504:
---

If I understand correctly, what you're doing is running the equivalent of this 
in you code, right:

{code}
new SparkContext(new SparkConf().set(spark.master, yarn-cluster))
{code}

That's not really supported, since that will not work in yarn-cluster mode, 
even if the ApplicationMaster launches successfully. I also took a look at your 
PR and that won't help. The fix here, if any, is to not allow the above code to 
work by throwing an exception early.

 NullPointerException when initializing SparkContext in YARN-cluster mode
 

 Key: SPARK-7504
 URL: https://issues.apache.org/jira/browse/SPARK-7504
 Project: Spark
  Issue Type: Bug
  Components: Deploy, YARN
Reporter: Zoltán Zvara
  Labels: deployment, yarn, yarn-client

 It is not clear for most users that, while running Spark on YARN a 
 {{SparkContext}} with a given execution plan can be run locally as 
 {{yarn-client}}, but can not deploy itself to the cluster. This is currently 
 performed using {{org.apache.spark.deploy.yarn.Client}}. {color:gray} I think 
 we should support deployment through {{SparkContext}}, but this is not the 
 point I wish to make here. {color}
 Configuring a {{SparkContext}} to deploy itself currently will yield an 
 {{ERROR}} while accessing {{spark.yarn.app.id}} in  
 {{YarnClusterSchedulerBackend}}, and after that a {{NullPointerException}} 
 while referencing the {{ApplicationMaster}} instance.
 Spark should clearly inform the user that it might be running in 
 {{yarn-cluster}} mode without a proper submission using {{Client}} and that 
 deploying is not supported directly from {{SparkContext}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7507) pyspark.sql.types.StructType and Row should implement iter()

2015-05-11 Thread Nicholas Chammas (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538283#comment-14538283
 ] 

Nicholas Chammas commented on SPARK-7507:
-

On a related note, perhaps we should also offer a method to quickly turn Python 
dicts back into StructTypes or Rows.

 pyspark.sql.types.StructType and Row should implement __iter__()
 

 Key: SPARK-7507
 URL: https://issues.apache.org/jira/browse/SPARK-7507
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Reporter: Nicholas Chammas
Priority: Minor

 {{StructType}} looks an awful lot like a Python dictionary.
 However, it doesn't implement {{\_\_iter\_\_()}}, so doing a quick conversion 
 like this doesn't work:
 {code}
  df = sqlContext.jsonRDD(sc.parallelize(['{name: El Magnifico}']))
  df.schema
 StructType(List(StructField(name,StringType,true)))
  dict(df.schema)
 Traceback (most recent call last):
   File stdin, line 1, in module
 TypeError: 'StructType' object is not iterable
 {code}
 This would be super helpful for doing any custom schema manipulations without 
 having to go through the whole {{.json() - json.loads() - manipulate() - 
 json.dumps() - .fromJson()}} charade.
 Same goes for {{Row}}, which offers an 
 [{{asDict()}}|https://spark.apache.org/docs/1.3.1/api/python/pyspark.sql.html#pyspark.sql.Row.asDict]
  method but doesn't support the more Pythonic {{dict(Row)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-7528) Java compatibility of RankingMetrics

2015-05-11 Thread Xiangrui Meng (JIRA)

Xiangrui Meng created SPARK-7528:


 Summary: Java compatibility of RankingMetrics
 Key: SPARK-7528
 URL: https://issues.apache.org/jira/browse/SPARK-7528
 Project: Spark
  Issue Type: Task
  Components: MLlib
Affects Versions: 1.4.0
Reporter: Xiangrui Meng


This is to check Java compatibility of RankingMetrics, which uses ClassTag. 
Maybe we should create a factory method for Java users that uses a fake tag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7530) Add API to get the current state of a StreamingContext

2015-05-11 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538126#comment-14538126
 ] 

Apache Spark commented on SPARK-7530:
-

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/6058

 Add API to get the current state of a StreamingContext
 --

 Key: SPARK-7530
 URL: https://issues.apache.org/jira/browse/SPARK-7530
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Blocker





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-7530) Add API to get the current state of a StreamingContext

2015-05-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7530:
---

Assignee: Apache Spark  (was: Tathagata Das)

 Add API to get the current state of a StreamingContext
 --

 Key: SPARK-7530
 URL: https://issues.apache.org/jira/browse/SPARK-7530
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Tathagata Das
Assignee: Apache Spark
Priority: Blocker





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5575) Artificial neural networks for MLlib deep learning

2015-05-11 Thread Alexander Ulanov (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538209#comment-14538209
 ] 

Alexander Ulanov commented on SPARK-5575:
-

Current implementation: 
https://github.com/avulanov/spark/tree/ann-interface-gemm

 Artificial neural networks for MLlib deep learning
 --

 Key: SPARK-5575
 URL: https://issues.apache.org/jira/browse/SPARK-5575
 Project: Spark
  Issue Type: Umbrella
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Alexander Ulanov

 Goal: Implement various types of artificial neural networks
 Motivation: deep learning trend
 Requirements: 
 1) Basic abstractions such as Neuron, Layer, Error, Regularization, Forward 
 and Backpropagation etc. should be implemented as traits or interfaces, so 
 they can be easily extended or reused
 2) Implement complex abstractions, such as feed forward and recurrent networks
 3) Implement multilayer perceptron (MLP), convolutional networks (LeNet), 
 autoencoder (sparse and denoising), stacked autoencoder, restricted  
 boltzmann machines (RBM), deep belief networks (DBN) etc.
 4) Implement or reuse supporting constucts, such as classifiers, normalizers, 
 poolers,  etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-7533) Decrease spacing between AM-RM heartbeats.

2015-05-11 Thread Sandy Ryza (JIRA)

Sandy Ryza created SPARK-7533:
-

 Summary: Decrease spacing between AM-RM heartbeats.
 Key: SPARK-7533
 URL: https://issues.apache.org/jira/browse/SPARK-7533
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 1.3.1
Reporter: Sandy Ryza


The current default of spark.yarn.scheduler.heartbeat.interval-ms is 5 seconds. 
 This is really long.  For reference, the MR equivalent is 1 second.

To avoid noise and unnecessary communication, we could have a fast rate for 
when we're waiting for executors and a slow rate for when we're just 
heartbeating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-7531) Install GPG on Jenkins machines

2015-05-11 Thread Patrick Wendell (JIRA)

Patrick Wendell created SPARK-7531:
--

 Summary: Install GPG on Jenkins machines
 Key: SPARK-7531
 URL: https://issues.apache.org/jira/browse/SPARK-7531
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Reporter: Patrick Wendell
Assignee: shane knapp


This one is also required for us to cut regular snapshot releases from Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-7534) Fix the Stage table when a stage is missing

2015-05-11 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7534:
---

Assignee: (was: Apache Spark)

 Fix the Stage table when a stage is missing
 ---

 Key: SPARK-7534
 URL: https://issues.apache.org/jira/browse/SPARK-7534
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Web UI
Reporter: Shixiong Zhu
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7534) Fix the Stage table when a stage is missing

2015-05-11 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538222#comment-14538222
 ] 

Apache Spark commented on SPARK-7534:
-

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/6061

 Fix the Stage table when a stage is missing
 ---

 Key: SPARK-7534
 URL: https://issues.apache.org/jira/browse/SPARK-7534
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Web UI
Reporter: Shixiong Zhu
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7483) [MLLib] Using Kryo with FPGrowth fails with an exception

2015-05-11 Thread Joseph K. Bradley (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538288#comment-14538288
 ] 

Joseph K. Bradley commented on SPARK-7483:
--

I agree; it should work, but I'm not sure why it's failing.  I'm not that 
familiar with Kryo, but I'll ask around.  Thanks for reporting this!

 [MLLib] Using Kryo with FPGrowth fails with an exception
 

 Key: SPARK-7483
 URL: https://issues.apache.org/jira/browse/SPARK-7483
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.3.1
Reporter: Tomasz Bartczak
Priority: Minor

 When using FPGrowth algorithm with KryoSerializer - Spark fails with
 {code}
 Job aborted due to stage failure: Task 0 in stage 9.0 failed 1 times, most 
 recent failure: Lost task 0.0 in stage 9.0 (TID 16, localhost): 
 com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: 
 Can not set final scala.collection.mutable.ListBuffer field 
 org.apache.spark.mllib.fpm.FPTree$Summary.nodes to 
 scala.collection.mutable.ArrayBuffer
 Serialization trace:
 nodes (org.apache.spark.mllib.fpm.FPTree$Summary)
 org$apache$spark$mllib$fpm$FPTree$$summaries 
 (org.apache.spark.mllib.fpm.FPTree)
 {code}
 This can be easily reproduced in spark codebase by setting 
 {code}
 conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer)
 {code} and running FPGrowthSuite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

< 1 2 3

201 - 258 of 258 matches

Mail list logo