[jira] [Commented] (SPARK-4430) Apache RAT Checks fail spuriously on test files

2014-11-15 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213873#comment-14213873
 ] 

Sean Owen commented on SPARK-4430:
--

I imagine the real issue is that the test should clean up these files if it 
does not already. Those aren't in the source tree. It is not really a RAT 
config issue. If the files were left because tests crashed or were killed then 
just delete them. 

> Apache RAT Checks fail spuriously on test files
> ---
>
> Key: SPARK-4430
> URL: https://issues.apache.org/jira/browse/SPARK-4430
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Ryan Williams
>
> Several of my recent runs of {{./dev/run-tests}} have failed quickly due to 
> Apache RAT checks, e.g.:
> {code}
> $ ./dev/run-tests
> =
> Running Apache RAT checks
> =
> Could not find Apache license headers in the following files:
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b732c105-4fd3-4330-ba6d-a366b340c303/test/28
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b732c105-4fd3-4330-ba6d-a366b340c303/test/29
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b732c105-4fd3-4330-ba6d-a366b340c303/test/30
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/10
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/11
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/12
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/13
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/14
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/15
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/16
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/17
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/18
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/19
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/20
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/21
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/22
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/23
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/24
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/25
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/26
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/27
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/28
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/29
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/30
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/7
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/8
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/9
> [error] Got a return code of 1 on line 114 of the run-tests script.
> {code}
> I think it's fair to say that these are not useful errors for {{run-tests}} 
> to crash on. Ideally we could tell the linter which files we care about 
> having it lint and which we don't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



issues@spark.apache.org

2014-11-15 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-4426:
---
Target Version/s: 1.2.0  (was: 1.2.0, 1.3.0)

> The symbol of BitwiseOr is wrong, should not be '&'
> ---
>
> Key: SPARK-4426
> URL: https://issues.apache.org/jira/browse/SPARK-4426
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 1.2.0
>
>
> The symbol of BitwiseOr is defined as '&' but I think it's wrong. It should 
> be '|'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



issues@spark.apache.org

2014-11-15 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-4426.

   Resolution: Fixed
Fix Version/s: 1.2.0
 Assignee: Kousuke Saruta

> The symbol of BitwiseOr is wrong, should not be '&'
> ---
>
> Key: SPARK-4426
> URL: https://issues.apache.org/jira/browse/SPARK-4426
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 1.2.0
>
>
> The symbol of BitwiseOr is defined as '&' but I think it's wrong. It should 
> be '|'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4427) In Spark Streaming when we perform window operations , it wil caluclate based on system time, i need to override it, i mean instead of getting current time app need to g

2014-11-15 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin closed SPARK-4427.
--

> In Spark Streaming when we perform window operations , it wil caluclate  
> based on system time, i need  to override it, i mean instead of getting 
> current time app need to get from my text file.
> 
>
> Key: SPARK-4427
> URL: https://issues.apache.org/jira/browse/SPARK-4427
> Project: Spark
>  Issue Type: Bug
>Reporter: ch.prasad
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Please provide solution asap.
> in window operation wen we give window size, it get data from rdd's by 
> caluclating window size with current time , i need to change current time , 
> to be read from my file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4419) Upgrade Snappy Java to 1.1.1.6

2014-11-15 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-4419.

   Resolution: Fixed
Fix Version/s: 1.2.0
 Assignee: Josh Rosen

> Upgrade Snappy Java to 1.1.1.6
> --
>
> Key: SPARK-4419
> URL: https://issues.apache.org/jira/browse/SPARK-4419
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Minor
> Fix For: 1.2.0
>
>
> We should upgrade the Snappy Java library to get better error reporting 
> improvements.  I had tried this previously in SPARK-4056 but had to revert 
> that PR due to a memory leak / regression in Snappy Java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4404) SparkSubmitDriverBootstrapper should stop after its SparkSubmit sub-process ends

2014-11-15 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213852#comment-14213852
 ] 

Davies Liu commented on SPARK-4404:
---

This JIRA is re-opened, the new bug is introduced by this JIRA.

Should I create a new JIRA for it?

> SparkSubmitDriverBootstrapper should stop after its SparkSubmit sub-process 
> ends 
> -
>
> Key: SPARK-4404
> URL: https://issues.apache.org/jira/browse/SPARK-4404
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: WangTaoTheTonic
>Assignee: WangTaoTheTonic
> Fix For: 1.2.0
>
>
> When we have spark.driver.extra* or spark.driver.memory in 
> SPARK_SUBMIT_PROPERTIES_FILE, spark-class will use 
> SparkSubmitDriverBootstrapper to launch driver.
> If we get process id of SparkSubmitDriverBootstrapper and wanna kill it 
> during its running, we expect its SparkSubmit sub-process stop also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4433) Racing condition in zipWithIndex

2014-11-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213850#comment-14213850
 ] 

Apache Spark commented on SPARK-4433:
-

User 'mengxr' has created a pull request for this issue:
https://github.com/apache/spark/pull/3291

> Racing condition in zipWithIndex
> 
>
> Key: SPARK-4433
> URL: https://issues.apache.org/jira/browse/SPARK-4433
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2, 1.1.1, 1.2.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> Spark hangs with the following code:
> {code}
> sc.parallelize(1 to 10).zipWithIndex.repartition(10).count()
> {code}
> This is because ZippedWithIndexRDD triggers a job in getPartitions and it 
> cause a deadlock in DAGScheduler.getPreferredLocs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2335) k-Nearest Neighbor classification and regression for MLLib

2014-11-15 Thread Kaushik Ranjan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213849#comment-14213849
 ] 

Kaushik Ranjan commented on SPARK-2335:
---

Ha ha.
"Shepherd" and I were working on this together.

[~bgawalt] - if you could review the code and suggest changes(if any), I can 
take it forward

> k-Nearest Neighbor classification and regression for MLLib
> --
>
> Key: SPARK-2335
> URL: https://issues.apache.org/jira/browse/SPARK-2335
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Brian Gawalt
>Priority: Minor
>  Labels: features
>
> The k-Nearest Neighbor model for classification and regression problems is a 
> simple and intuitive approach, offering a straightforward path to creating 
> non-linear decision/estimation contours. It's downsides -- high variance 
> (sensitivity to the known training data set) and computational intensity for 
> estimating new point labels -- both play to Spark's big data strengths: lots 
> of data mitigates data concerns; lots of workers mitigate computational 
> latency. 
> We should include kNN models as options in MLLib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4433) Racing condition in zipWithIndex

2014-11-15 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-4433:


 Summary: Racing condition in zipWithIndex
 Key: SPARK-4433
 URL: https://issues.apache.org/jira/browse/SPARK-4433
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.1, 1.2.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng


Spark hangs with the following code:

{code}
sc.parallelize(1 to 10).zipWithIndex.repartition(10).count()
{code}

This is because ZippedWithIndexRDD triggers a job in getPartitions and it cause 
a deadlock in DAGScheduler.getPreferredLocs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4432) Resource(InStream) is not closed in TachyonStore

2014-11-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213845#comment-14213845
 ] 

Apache Spark commented on SPARK-4432:
-

User 'shimingfei' has created a pull request for this issue:
https://github.com/apache/spark/pull/3290

> Resource(InStream) is not closed in TachyonStore
> 
>
> Key: SPARK-4432
> URL: https://issues.apache.org/jira/browse/SPARK-4432
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 1.1.0
>Reporter: shimingfei
>
> In TachyonStore, InStream is not closed after data is read  from Tachyon. 
> which makes the blocks in Tachyon locked after accessed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4432) Resource(InStream) is not closed in TachyonStore

2014-11-15 Thread shimingfei (JIRA)
shimingfei created SPARK-4432:
-

 Summary: Resource(InStream) is not closed in TachyonStore
 Key: SPARK-4432
 URL: https://issues.apache.org/jira/browse/SPARK-4432
 Project: Spark
  Issue Type: Bug
  Components: Block Manager
Affects Versions: 1.1.0
Reporter: shimingfei


In TachyonStore, InStream is not closed after data is read  from Tachyon. which 
makes the blocks in Tachyon locked after accessed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4404) SparkSubmitDriverBootstrapper should stop after its SparkSubmit sub-process ends

2014-11-15 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213840#comment-14213840
 ] 

Marcelo Vanzin commented on SPARK-4404:
---

Hmmm, my reading of the bug title and the bug description don't match.

Title: "SparkSubmitDriverBootstrapper should stop after SparkSubmit ends"
Descriptiont: "killing SparkSubmitDriverBootstrapper should also kill 
SparkSubmit"

Pardon if I misunderstood something, but could you clarify what's not working 
as expected?

> SparkSubmitDriverBootstrapper should stop after its SparkSubmit sub-process 
> ends 
> -
>
> Key: SPARK-4404
> URL: https://issues.apache.org/jira/browse/SPARK-4404
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: WangTaoTheTonic
>Assignee: WangTaoTheTonic
> Fix For: 1.2.0
>
>
> When we have spark.driver.extra* or spark.driver.memory in 
> SPARK_SUBMIT_PROPERTIES_FILE, spark-class will use 
> SparkSubmitDriverBootstrapper to launch driver.
> If we get process id of SparkSubmitDriverBootstrapper and wanna kill it 
> during its running, we expect its SparkSubmit sub-process stop also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4404) SparkSubmitDriverBootstrapper should stop after its SparkSubmit sub-process ends

2014-11-15 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213838#comment-14213838
 ] 

Davies Liu commented on SPARK-4404:
---

Also pyspark failed to start, if spark.driver.memory is set, I have not 
investigate the details yet.

> SparkSubmitDriverBootstrapper should stop after its SparkSubmit sub-process 
> ends 
> -
>
> Key: SPARK-4404
> URL: https://issues.apache.org/jira/browse/SPARK-4404
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: WangTaoTheTonic
>Assignee: WangTaoTheTonic
> Fix For: 1.2.0
>
>
> When we have spark.driver.extra* or spark.driver.memory in 
> SPARK_SUBMIT_PROPERTIES_FILE, spark-class will use 
> SparkSubmitDriverBootstrapper to launch driver.
> If we get process id of SparkSubmitDriverBootstrapper and wanna kill it 
> during its running, we expect its SparkSubmit sub-process stop also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2335) k-Nearest Neighbor classification and regression for MLLib

2014-11-15 Thread Brian Gawalt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Gawalt updated SPARK-2335:

Shepherd: Ashutosh Trivedi

> k-Nearest Neighbor classification and regression for MLLib
> --
>
> Key: SPARK-2335
> URL: https://issues.apache.org/jira/browse/SPARK-2335
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Brian Gawalt
>Priority: Minor
>  Labels: features
>
> The k-Nearest Neighbor model for classification and regression problems is a 
> simple and intuitive approach, offering a straightforward path to creating 
> non-linear decision/estimation contours. It's downsides -- high variance 
> (sensitivity to the known training data set) and computational intensity for 
> estimating new point labels -- both play to Spark's big data strengths: lots 
> of data mitigates data concerns; lots of workers mitigate computational 
> latency. 
> We should include kNN models as options in MLLib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4404) SparkSubmitDriverBootstrapper should stop after its SparkSubmit sub-process ends

2014-11-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213832#comment-14213832
 ] 

Apache Spark commented on SPARK-4404:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/3289

> SparkSubmitDriverBootstrapper should stop after its SparkSubmit sub-process 
> ends 
> -
>
> Key: SPARK-4404
> URL: https://issues.apache.org/jira/browse/SPARK-4404
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: WangTaoTheTonic
>Assignee: WangTaoTheTonic
> Fix For: 1.2.0
>
>
> When we have spark.driver.extra* or spark.driver.memory in 
> SPARK_SUBMIT_PROPERTIES_FILE, spark-class will use 
> SparkSubmitDriverBootstrapper to launch driver.
> If we get process id of SparkSubmitDriverBootstrapper and wanna kill it 
> during its running, we expect its SparkSubmit sub-process stop also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-4404) SparkSubmitDriverBootstrapper should stop after its SparkSubmit sub-process ends

2014-11-15 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu reopened SPARK-4404:
---

After this patch, the SparkSubmitDriverBootstrapper will not exit if 
SparkSubmit die first.

> SparkSubmitDriverBootstrapper should stop after its SparkSubmit sub-process 
> ends 
> -
>
> Key: SPARK-4404
> URL: https://issues.apache.org/jira/browse/SPARK-4404
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: WangTaoTheTonic
>Assignee: WangTaoTheTonic
> Fix For: 1.2.0
>
>
> When we have spark.driver.extra* or spark.driver.memory in 
> SPARK_SUBMIT_PROPERTIES_FILE, spark-class will use 
> SparkSubmitDriverBootstrapper to launch driver.
> If we get process id of SparkSubmitDriverBootstrapper and wanna kill it 
> during its running, we expect its SparkSubmit sub-process stop also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4431) Implement efficient activeIterator for dense and sparse vector

2014-11-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213826#comment-14213826
 ] 

Apache Spark commented on SPARK-4431:
-

User 'dbtsai' has created a pull request for this issue:
https://github.com/apache/spark/pull/3288

> Implement efficient activeIterator for dense and sparse vector
> --
>
> Key: SPARK-4431
> URL: https://issues.apache.org/jira/browse/SPARK-4431
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: DB Tsai
>
> Previously, we were using Breeze's activeIterator to access the non-zero 
> elements in sparse vector, and explicitly skipping the zero in dense/sparse 
> vector using pattern matching. Due to the overhead, we switched back to 
> native `while loop` in #SPARK-4129.
> However, #SPARK-4129 requires de-reference the dv.values/sv.values in each 
> access to the value, and the zeros in dense vector and sparse vector if exist 
> are skipped in the add function call; the overall penalty will be around 10% 
> compared with de-reference once outside the while block, and checking if zero 
> before calling the add function. The code is branched out for dense and 
> sparse vector, and it's not easy to maintain in the long term.
> Not only this activeIterator implementation increases the performance, but 
> the abstraction of accessing the non-zero elements in different vector type 
> also helps the maintainability of codebase. In this PR, only 
> MultivariateOnlineSummarizer uses new API as example, and others can be 
> migrated to activeIterator later. 
> Benchmarking with mnist8m dataset on single JVM with first 200 samples loaded 
> in memory, and repeating 5000 times. 
> Before change: 
> Sparse Vector - 30.02
> Dense Vector - 38.27
> After this optimization:
> Sparse Vector - 27.54
> Dense Vector - 35.13



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4431) Implement efficient activeIterator for dense and sparse vector

2014-11-15 Thread DB Tsai (JIRA)
DB Tsai created SPARK-4431:
--

 Summary: Implement efficient activeIterator for dense and sparse 
vector
 Key: SPARK-4431
 URL: https://issues.apache.org/jira/browse/SPARK-4431
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: DB Tsai


Previously, we were using Breeze's activeIterator to access the non-zero 
elements in sparse vector, and explicitly skipping the zero in dense/sparse 
vector using pattern matching. Due to the overhead, we switched back to native 
`while loop` in #SPARK-4129.

However, #SPARK-4129 requires de-reference the dv.values/sv.values in each 
access to the value, and the zeros in dense vector and sparse vector if exist 
are skipped in the add function call; the overall penalty will be around 10% 
compared with de-reference once outside the while block, and checking if zero 
before calling the add function. The code is branched out for dense and sparse 
vector, and it's not easy to maintain in the long term.

Not only this activeIterator implementation increases the performance, but the 
abstraction of accessing the non-zero elements in different vector type also 
helps the maintainability of codebase. In this PR, only 
MultivariateOnlineSummarizer uses new API as example, and others can be 
migrated to activeIterator later. 

Benchmarking with mnist8m dataset on single JVM with first 200 samples loaded 
in memory, and repeating 5000 times. 

Before change: 
Sparse Vector - 30.02
Dense Vector - 38.27

After this optimization:
Sparse Vector - 27.54
Dense Vector - 35.13



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4430) Apache RAT Checks fail spuriously on test files

2014-11-15 Thread Ryan Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213811#comment-14213811
 ] 

Ryan Williams commented on SPARK-4430:
--

I did find [this RAT JIRA|https://issues.apache.org/jira/browse/RAT-161] that 
seems somewhat related, but if there's anything we could do to work around this 
in Spark in the shorter term that would be great too.

> Apache RAT Checks fail spuriously on test files
> ---
>
> Key: SPARK-4430
> URL: https://issues.apache.org/jira/browse/SPARK-4430
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Ryan Williams
>
> Several of my recent runs of {{./dev/run-tests}} have failed quickly due to 
> Apache RAT checks, e.g.:
> {code}
> $ ./dev/run-tests
> =
> Running Apache RAT checks
> =
> Could not find Apache license headers in the following files:
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b732c105-4fd3-4330-ba6d-a366b340c303/test/28
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b732c105-4fd3-4330-ba6d-a366b340c303/test/29
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b732c105-4fd3-4330-ba6d-a366b340c303/test/30
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/10
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/11
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/12
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/13
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/14
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/15
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/16
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/17
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/18
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/19
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/20
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/21
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/22
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/23
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/24
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/25
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/26
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/27
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/28
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/29
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/30
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/7
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/8
>  !? 
> /Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/9
> [error] Got a return code of 1 on line 114 of the run-tests script.
> {code}
> I think it's fair to say that these are not useful errors for {{run-tests}} 
> to crash on. Ideally we could tell the linter which files we care about 
> having it lint and which we don't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4430) Apache RAT Checks fail spuriously on test files

2014-11-15 Thread Ryan Williams (JIRA)
Ryan Williams created SPARK-4430:


 Summary: Apache RAT Checks fail spuriously on test files
 Key: SPARK-4430
 URL: https://issues.apache.org/jira/browse/SPARK-4430
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Ryan Williams


Several of my recent runs of {{./dev/run-tests}} have failed quickly due to 
Apache RAT checks, e.g.:

{code}
$ ./dev/run-tests

=
Running Apache RAT checks
=
Could not find Apache license headers in the following files:
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b732c105-4fd3-4330-ba6d-a366b340c303/test/28
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b732c105-4fd3-4330-ba6d-a366b340c303/test/29
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b732c105-4fd3-4330-ba6d-a366b340c303/test/30
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/10
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/11
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/12
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/13
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/14
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/15
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/16
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/17
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/18
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/19
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/20
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/21
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/22
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/23
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/24
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/25
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/26
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/27
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/28
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/29
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/30
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/7
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/8
 !? 
/Users/ryan/c/spark/streaming/FailureSuite/b98beebe-98b0-472a-b4a5-060bcd91e401/test/9
[error] Got a return code of 1 on line 114 of the run-tests script.
{code}

I think it's fair to say that these are not useful errors for {{run-tests}} to 
crash on. Ideally we could tell the linter which files we care about having it 
lint and which we don't.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4419) Upgrade Snappy Java to 1.1.1.6

2014-11-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213744#comment-14213744
 ] 

Apache Spark commented on SPARK-4419:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/3287

> Upgrade Snappy Java to 1.1.1.6
> --
>
> Key: SPARK-4419
> URL: https://issues.apache.org/jira/browse/SPARK-4419
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Josh Rosen
>Priority: Minor
>
> We should upgrade the Snappy Java library to get better error reporting 
> improvements.  I had tried this previously in SPARK-4056 but had to revert 
> that PR due to a memory leak / regression in Snappy Java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4402) Output path validation of an action statement resulting in runtime exception

2014-11-15 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213739#comment-14213739
 ] 

Sean Owen commented on SPARK-4402:
--

Look at the code in PairRDDFunctions.saveAsHadoopDataset, which is what 
ultimately gets called. You'll see it try to check the output configuration 
upfront:

{code}
if (self.conf.getBoolean("spark.hadoop.validateOutputSpecs", true)) {
  // FileOutputFormat ignores the filesystem parameter
  val ignoredFs = FileSystem.get(hadoopConf)
  hadoopConf.getOutputFormat.checkOutputSpecs(ignoredFs, hadoopConf)
}
{code}

It's enabled by default. I wonder if the code path is somehow using a 
nonstandard InputFormat that doesn't check?
But this should cause an exception if the output path exists, before it starts, 
and was committed in SPARK-1100 for 1.0.

> Output path validation of an action statement resulting in runtime exception
> 
>
> Key: SPARK-4402
> URL: https://issues.apache.org/jira/browse/SPARK-4402
> Project: Spark
>  Issue Type: Wish
>Reporter: Vijay
>Priority: Minor
>
> Output path validation is happening at the time of statement execution as a 
> part of lazyevolution of action statement. But if the path already exists 
> then it throws a runtime exception. Hence all the processing completed till 
> that point is lost which results in resource wastage (processing time and CPU 
> usage).
> If this I/O related validation is done before the RDD action operations then 
> this runtime exception can be avoided.
> I believe similar validation/ feature is implemented in hadoop also.
> Example:
> SchemaRDD.saveAsTextFile() evaluated the path during runtime 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4402) Output path validation of an action statement resulting in runtime exception

2014-11-15 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213729#comment-14213729
 ] 

Vijay commented on SPARK-4402:
--

Thanks for the reply [~srowen]

This is different scenario from the issue SPARK-1100.

Issue SPARK-1100 says that output directory is over written if it exists.
I think that fix works fine.

But, my concern is that spark throws a runtime exception if the output 
directory exists. This is happening after executing all the previous action 
statements and resulting in abrupt termination of the program. Result of the 
previous action statements is lost.

Please confirm whether this abrupt program termination is expected?

> Output path validation of an action statement resulting in runtime exception
> 
>
> Key: SPARK-4402
> URL: https://issues.apache.org/jira/browse/SPARK-4402
> Project: Spark
>  Issue Type: Wish
>Reporter: Vijay
>Priority: Minor
>
> Output path validation is happening at the time of statement execution as a 
> part of lazyevolution of action statement. But if the path already exists 
> then it throws a runtime exception. Hence all the processing completed till 
> that point is lost which results in resource wastage (processing time and CPU 
> usage).
> If this I/O related validation is done before the RDD action operations then 
> this runtime exception can be avoided.
> I believe similar validation/ feature is implemented in hadoop also.
> Example:
> SchemaRDD.saveAsTextFile() evaluated the path during runtime 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4428) Use ${scala.binary.version} property for artifactId.

2014-11-15 Thread Mark Hamstra (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Hamstra resolved SPARK-4428.
-
Resolution: Won't Fix

> Use ${scala.binary.version} property for artifactId.
> 
>
> Key: SPARK-4428
> URL: https://issues.apache.org/jira/browse/SPARK-4428
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Takuya Ueshin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4428) Use ${scala.binary.version} property for artifactId.

2014-11-15 Thread Mark Hamstra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213664#comment-14213664
 ] 

Mark Hamstra commented on SPARK-4428:
-

This is not a bug, nor is it a major issue, nor is parameterizing artifactId's 
in this way permissible.  This is a Won't Fix.

> Use ${scala.binary.version} property for artifactId.
> 
>
> Key: SPARK-4428
> URL: https://issues.apache.org/jira/browse/SPARK-4428
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Takuya Ueshin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4427) In Spark Streaming when we perform window operations , it wil caluclate based on system time, i need to override it, i mean instead of getting current time app need to

2014-11-15 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-4427.
--
Resolution: Invalid

JIRAs are for reporting specific issues and proposing changes. This is a 
question, so would be better asked on the mailing list. I suggest you reword 
the question though to be more complete and specific as it's not quite clear 
what the issue is.

> In Spark Streaming when we perform window operations , it wil caluclate  
> based on system time, i need  to override it, i mean instead of getting 
> current time app need to get from my text file.
> 
>
> Key: SPARK-4427
> URL: https://issues.apache.org/jira/browse/SPARK-4427
> Project: Spark
>  Issue Type: Bug
>Reporter: ch.prasad
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Please provide solution asap.
> in window operation wen we give window size, it get data from rdd's by 
> caluclating window size with current time , i need to change current time , 
> to be read from my file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4429) Build for Scala 2.11 using sbt fails.

2014-11-15 Thread Takuya Ueshin (JIRA)
Takuya Ueshin created SPARK-4429:


 Summary: Build for Scala 2.11 using sbt fails.
 Key: SPARK-4429
 URL: https://issues.apache.org/jira/browse/SPARK-4429
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Takuya Ueshin


I tried to build for Scala 2.11 using sbt with the following command:

{quote}
$ sbt/sbt -Dscala-2.11 assembly
{quote}

but it ends with the following error messages:

{quote}
\[error\] (streaming-kafka/*:update) sbt.ResolveException: unresolved 
dependency: org.apache.kafka#kafka_2.11;0.8.0: not found
\[error\] (catalyst/*:update) sbt.ResolveException: unresolved dependency: 
org.scalamacros#quasiquotes_2.11;2.0.1: not found
{quote}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4403) Elastic allocation(spark.dynamicAllocation.enabled) results in task never being execued.

2014-11-15 Thread Egor Pahomov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Egor Pahomov closed SPARK-4403.
---
Resolution: Invalid

> Elastic allocation(spark.dynamicAllocation.enabled) results in task never 
> being execued.
> 
>
> Key: SPARK-4403
> URL: https://issues.apache.org/jira/browse/SPARK-4403
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.1.1
>Reporter: Egor Pahomov
> Attachments: ipython_out
>
>
> I execute ipython notebook + pyspark with spark.dynamicAllocation.enabled = 
> true. Task never ends.
> Code:
> {code}
> import sys
> from random import random
> from operator import add
> partitions = 10
> n = 10 * partitions
> def f(_):
> x = random() * 2 - 1
> y = random() * 2 - 1
> return 1 if x ** 2 + y ** 2 < 1 else 0
> count = sc.parallelize(xrange(1, n + 1), partitions).map(f).reduce(add)
> print "Pi is roughly %f" % (4.0 * count / n)
> {code}
> {code}
> IPYTHON_ARGS="notebook --profile=ydf --port $IPYTHON_PORT --port-retries=0 
> --ip='*' --no-browser"
> pyspark \
> --verbose \
> --master yarn-client \
> --conf spark.driver.port=$((RANDOM_PORT + 2)) \
> --conf spark.broadcast.port=$((RANDOM_PORT + 3)) \
> --conf spark.replClassServer.port=$((RANDOM_PORT + 4)) \
> --conf spark.blockManager.port=$((RANDOM_PORT + 5)) \
> --conf spark.executor.port=$((RANDOM_PORT + 6)) \
> --conf spark.fileserver.port=$((RANDOM_PORT + 7)) \
> --conf spark.shuffle.service.enabled=true \
> --conf spark.dynamicAllocation.enabled=true \
> --conf spark.dynamicAllocation.minExecutors=1 \
> --conf spark.dynamicAllocation.maxExecutors=10 \
> --conf spark.ui.port=$SPARK_UI_PORT
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4428) Use ${scala.binary.version} property for artifactId.

2014-11-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213593#comment-14213593
 ] 

Apache Spark commented on SPARK-4428:
-

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/3285

> Use ${scala.binary.version} property for artifactId.
> 
>
> Key: SPARK-4428
> URL: https://issues.apache.org/jira/browse/SPARK-4428
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Takuya Ueshin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4428) Use ${scala.binary.version} property for artifactId.

2014-11-15 Thread Takuya Ueshin (JIRA)
Takuya Ueshin created SPARK-4428:


 Summary: Use ${scala.binary.version} property for artifactId.
 Key: SPARK-4428
 URL: https://issues.apache.org/jira/browse/SPARK-4428
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Takuya Ueshin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



issues@spark.apache.org

2014-11-15 Thread ch.prasad (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213567#comment-14213567
 ] 

ch.prasad commented on SPARK-4426:
--

please see my issue .. i hope all of you!
 SPARK-4427
thanks..!

> The symbol of BitwiseOr is wrong, should not be '&'
> ---
>
> Key: SPARK-4426
> URL: https://issues.apache.org/jira/browse/SPARK-4426
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
>Priority: Minor
>
> The symbol of BitwiseOr is defined as '&' but I think it's wrong. It should 
> be '|'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4427) In Spark Streaming when we perform window operations , it wil caluclate based on system time, i need to override it, i mean instead of getting current time app need to

2014-11-15 Thread ch.prasad (JIRA)
ch.prasad created SPARK-4427:


 Summary: In Spark Streaming when we perform window operations , it 
wil caluclate  based on system time, i need  to override it, i mean instead of 
getting current time app need to get from my text file.
 Key: SPARK-4427
 URL: https://issues.apache.org/jira/browse/SPARK-4427
 Project: Spark
  Issue Type: Bug
Reporter: ch.prasad


Please provide solution asap.
in window operation wen we give window size, it get data from rdd's by 
caluclating window size with current time , i need to change current time , to 
be read from my file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



issues@spark.apache.org

2014-11-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213472#comment-14213472
 ] 

Apache Spark commented on SPARK-4426:
-

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/3284

> The symbol of BitwiseOr is wrong, should not be '&'
> ---
>
> Key: SPARK-4426
> URL: https://issues.apache.org/jira/browse/SPARK-4426
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
>Priority: Minor
>
> The symbol of BitwiseOr is defined as '&' but I think it's wrong. It should 
> be '|'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



issues@spark.apache.org

2014-11-15 Thread Kousuke Saruta (JIRA)
Kousuke Saruta created SPARK-4426:
-

 Summary: The symbol of BitwiseOr is wrong, should not be '&'
 Key: SPARK-4426
 URL: https://issues.apache.org/jira/browse/SPARK-4426
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Kousuke Saruta
Priority: Minor


The symbol of BitwiseOr is defined as '&' but I think it's wrong. It should be 
'|'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4425) Handle NaN or Infinity cast to Timestamp correctly

2014-11-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213452#comment-14213452
 ] 

Apache Spark commented on SPARK-4425:
-

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/3283

> Handle NaN or Infinity cast to Timestamp correctly
> --
>
> Key: SPARK-4425
> URL: https://issues.apache.org/jira/browse/SPARK-4425
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Takuya Ueshin
>
> {{Cast}} from {{NaN}} or {{Infinity}} of {{Double}} or {{Float}} to 
> {{TimestampType}} throws {{NumberFormatException}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4425) Handle NaN or Infinity cast to Timestamp correctly

2014-11-15 Thread Takuya Ueshin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-4425:
-
Summary: Handle NaN or Infinity cast to Timestamp correctly  (was: Handle 
NaN cast to Timestamp correctly)

> Handle NaN or Infinity cast to Timestamp correctly
> --
>
> Key: SPARK-4425
> URL: https://issues.apache.org/jira/browse/SPARK-4425
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Takuya Ueshin
>
> {{Cast}} from {{NaN}} or {{Infinity}} of {{Double}} or {{Float}} to 
> {{TimestampType}} throws {{NumberFormatException}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4425) Handle NaN cast to Timestamp correctly

2014-11-15 Thread Takuya Ueshin (JIRA)
Takuya Ueshin created SPARK-4425:


 Summary: Handle NaN cast to Timestamp correctly
 Key: SPARK-4425
 URL: https://issues.apache.org/jira/browse/SPARK-4425
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Takuya Ueshin


{{Cast}} from {{NaN}} or {{Infinity}} of {{Double}} or {{Float}} to 
{{TimestampType}} throws {{NumberFormatException}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org