date:20140910

[jira] [Commented] (SPARK-1719) spark.executor.extraLibraryPath isn't applied on yarn

2014-09-10 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129644#comment-14129644
 ] 

Wilfred Spiegelenburg commented on SPARK-1719:
--

I looked through the pull request linked here and the pull request that closed 
the linked one and I can not see any reference to the 
spark.executor.extraLibraryPath. I went through the trunk and the only place I 
can see it is in the mesos code.
Can you explain how the change in https://github.com/apache/spark/pull/1031 
fixes the issue reported without referencing the setting at all?

> spark.executor.extraLibraryPath isn't applied on yarn
> -
>
> Key: SPARK-1719
> URL: https://issues.apache.org/jira/browse/SPARK-1719
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 1.0.0
>Reporter: Thomas Graves
>Assignee: Guoqiang Li
> Fix For: 1.1.0
>
>
> Looking through the code for spark on yarn I don't see that 
> spark.executor.extraLibraryPath is being properly applied when it launches 
> executors.  It is using the spark.driver.libraryPath in the ClientBase.
> Note I didn't actually test it so its possible I missed something.
> I also think better to use LD_LIBRARY_PATH rather then -Djava.library.path.  
> once  java.library.path is set, it doesn't search LD_LIBRARY_PATH.  In Hadoop 
> we switched to use LD_LIBRARY_PATH instead of java.library.path.  See 
> https://issues.apache.org/jira/browse/MAPREDUCE-4072.  I'll split this into 
> separate jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2781) Analyzer should check resolution of LogicalPlans

2014-09-10 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2781.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

> Analyzer should check resolution of LogicalPlans
> 
>
> Key: SPARK-2781
> URL: https://issues.apache.org/jira/browse/SPARK-2781
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.1, 1.1.0
>Reporter: Aaron Staple
>Assignee: Michael Armbrust
> Fix For: 1.2.0
>
>
> Currently the Analyzer’s CheckResolution rule checks that all attributes are 
> resolved by searching for unresolved Expressions.  But some LogicalPlans, 
> including Union, contain custom implementations of the resolve attribute that 
> validate other criteria in addition to checking for attribute resolution of 
> their descendants.  These LogicalPlans are not currently validated by the 
> CheckResolution implementation.
> As a result, it is currently possible to execute a query generated from 
> unresolved LogicalPlans.  One example is a UNION query that produces rows 
> with different data types in the same column:
> {noformat}
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext._
> case class T1(value:Seq[Int])
> val t1 = sc.parallelize(Seq(T1(Seq(0,1
> t1.registerAsTable("t1")
> sqlContext.sql("SELECT value FROM t1 UNION SELECT 2 FROM t1”).collect()
> {noformat}
> In this example, the type coercion implementation cannot unify array and 
> integer types.  One row contains an array in the returned column and the 
> other row contains an integer.  The result is:
> {noformat}
> res3: Array[org.apache.spark.sql.Row] = Array([List(0, 1)], [2])
> {noformat}
> I believe fixing this is a first step toward improving validation for Union 
> (and similar) plans.  (For instance, Union does not currently validate that 
> its children contain the same number of columns.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3447) Kryo NPE when serializing JListWrapper

2014-09-10 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3447.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

> Kryo NPE when serializing JListWrapper
> --
>
> Key: SPARK-3447
> URL: https://issues.apache.org/jira/browse/SPARK-3447
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
> Fix For: 1.2.0
>
>
> Repro (provided by [~davies]):
> {code}
> from pyspark.sql import SQLContext; 
> SQLContext(sc).inferSchema(sc.parallelize([{"a": 
> [3]}]))._jschema_rdd.collect()
> {code}
> {code}
> 14/09/05 21:59:47 ERROR TaskResultGetter: Exception while getting task result
> com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
> Serialization trace:
> underlying (scala.collection.convert.Wrappers$JListWrapper)
> values (org.apache.spark.sql.catalyst.expressions.GenericRow)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
> at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
> at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:162)
> at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
> at 
> org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:514)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:355)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1276)
> at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:701)
> Caused by: java.lang.NullPointerException
> at 
> scala.collection.convert.Wrappers$MutableBufferWrapper.add(Wrappers.scala:80)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
> at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
> ... 23 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3480) Throws out Not a valid command 'yarn-alpha/scalastyle' in dev/scalastyle for sbt build tool during 'Running Scala style checks'

2014-09-10 Thread Yi Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhou updated SPARK-3480:
---
Description: 
Symptom:
Run ./dev/run-tests and dump outputs as following:
SBT_MAVEN_PROFILES_ARGS="-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 
-Pkinesis-asl"
[Warn] Java 8 tests will not run because JDK version is < 1.8.
=
Running Apache RAT checks
=
RAT checks passed.
=
Running Scala style checks
=
Scalastyle checks failed at following occurrences:
[error] Expected ID character
[error] Not a valid command: yarn-alpha
[error] Expected project ID
[error] Expected configuration
[error] Expected ':' (if selecting a configuration)
[error] Expected key
[error] Not a valid key: yarn-alpha
[error] yarn-alpha/scalastyle
[error]   ^

Possible Cause:
I checked the dev/scalastyle, found that there are 2 parameters 
'yarn-alpha/scalastyle' and 'yarn/scalastyle' separately,like
echo -e "q\n" | sbt/sbt -Pyarn -Phadoop-0.23 -Dhadoop.version=0.23.9 
yarn-alpha/scalastyle \
  >> scalastyle.txt

echo -e "q\n" | sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 
yarn/scalastyle \
  >> scalastyle.txt

>From above error message, sbt seems to complain them due to '/' separator. So 
>it can be run through after  I manually modified original ones to  
>'yarn-alpha:scalastyle' and 'yarn:scalastyle'..

  was:
Symptom:
Run ./dev/run-tests and dump outputs as following:
SBT_MAVEN_PROFILES_ARGS="-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 
-Pkinesis-asl"
[Warn] Java 8 tests will not run because JDK version is < 1.8.
=
Running Apache RAT checks
=
RAT checks passed.
=
Running Scala style checks
=
Scalastyle checks failed at following occurrences:
[error] Expected ID character
[error] Not a valid command: yarn-alpha
[error] Expected project ID
[error] Expected configuration
[error] Expected ':' (if selecting a configuration)
[error] Expected key
[error] Not a valid key: yarn-alpha
[error] yarn-alpha/scalastyle
[error]   ^

Possible Cause:
I checked the dev/scalastyle, found that there are 2 parameters 
'yarn-alpha/scalastyle' and 'yarn/scalastyle' separately,like
echo -e "q\n" | sbt/sbt -Pyarn -Phadoop-0.23 -Dhadoop.version=0.23.9 
yarn-alpha/scalastyle \
  >> scalastyle.txt
# Check style with YARN built too
echo -e "q\n" | sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 
yarn/scalastyle \
  >> scalastyle.txt

>From above error message, sbt seems to complain them due to '/' separator. So 
>it can be run through after  I manually modified original ones to  
>'yarn-alpha:scalastyle' and 'yarn:scalastyle'..


> Throws out Not a valid command 'yarn-alpha/scalastyle' in dev/scalastyle for 
> sbt build tool during 'Running Scala style checks'
> ---
>
> Key: SPARK-3480
> URL: https://issues.apache.org/jira/browse/SPARK-3480
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Yi Zhou
>Priority: Minor
>
> Symptom:
> Run ./dev/run-tests and dump outputs as following:
> SBT_MAVEN_PROFILES_ARGS="-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 
> -Pkinesis-asl"
> [Warn] Java 8 tests will not run because JDK version is < 1.8.
> =
> Running Apache RAT checks
> =
> RAT checks passed.
> =
> Running Scala style checks
> =
> Scalastyle checks failed at following occurrences:
> [error] Expected ID character
> [error] Not a valid command: yarn-alpha
> [error] Expected project ID
> [error] Expected configuration
> [error] Expected ':' (if selecting a configuration)
> [error] Expected key
> [error] Not a valid key: yarn-alpha
> [error] yarn-alpha/scalastyle
> [error]   ^
> Possible Cause:
> I checked the dev/scalastyle, found that there are 2 parameters 
> 'yarn-alpha/scalastyle' and 'yarn/scalastyle' separately,like
> echo -e "q\n" | sbt/sbt -Pyarn -Phadoop-0.23 -Dhadoop.version=0.23.9 
> yarn-alpha/scalastyle \
>   >> scalastyle.txt
> echo -e "q\n" | sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoo

[jira] [Created] (SPARK-3480) Throws out Not a valid command 'yarn-alpha/scalastyle' in dev/scalastyle for sbt build tool during 'Running Scala style checks'

2014-09-10 Thread Yi Zhou (JIRA)

Yi Zhou created SPARK-3480:
--

 Summary: Throws out Not a valid command 'yarn-alpha/scalastyle' in 
dev/scalastyle for sbt build tool during 'Running Scala style checks'
 Key: SPARK-3480
 URL: https://issues.apache.org/jira/browse/SPARK-3480
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Yi Zhou
Priority: Minor


Symptom:
Run ./dev/run-tests and dump outputs as following:
SBT_MAVEN_PROFILES_ARGS="-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 
-Pkinesis-asl"
[Warn] Java 8 tests will not run because JDK version is < 1.8.
=
Running Apache RAT checks
=
RAT checks passed.
=
Running Scala style checks
=
Scalastyle checks failed at following occurrences:
[error] Expected ID character
[error] Not a valid command: yarn-alpha
[error] Expected project ID
[error] Expected configuration
[error] Expected ':' (if selecting a configuration)
[error] Expected key
[error] Not a valid key: yarn-alpha
[error] yarn-alpha/scalastyle
[error]   ^

Possible Cause:
I checked the dev/scalastyle, found that there are 2 parameters 
'yarn-alpha/scalastyle' and 'yarn/scalastyle' separately,like
echo -e "q\n" | sbt/sbt -Pyarn -Phadoop-0.23 -Dhadoop.version=0.23.9 
yarn-alpha/scalastyle \
  >> scalastyle.txt
# Check style with YARN built too
echo -e "q\n" | sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 
yarn/scalastyle \
  >> scalastyle.txt

>From above error message, sbt seems to complain them due to '/' separator. So 
>it can be run through after  I manually modified original ones to  
>'yarn-alpha:scalastyle' and 'yarn:scalastyle'..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3479) Have Jenkins show which tests failed in his GitHub messages

2014-09-10 Thread Nicholas Chammas (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-3479:

Summary: Have Jenkins show which tests failed in his GitHub messages  (was: 
Have Jenkins show which tests failed in GitHub message)

> Have Jenkins show which tests failed in his GitHub messages
> ---
>
> Key: SPARK-3479
> URL: https://issues.apache.org/jira/browse/SPARK-3479
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Nicholas Chammas
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3478) Profile Python tasks stage by stage in worker

2014-09-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129503#comment-14129503
 ] 

Apache Spark commented on SPARK-3478:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/2351

> Profile Python tasks stage by stage in worker
> -
>
> Key: SPARK-3478
> URL: https://issues.apache.org/jira/browse/SPARK-3478
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> The Python code in driver is easy to profile by users, but the code run in 
> worker is distributed in clusters, is not easy to profile by users.
> So we need a way to do the profiling in worker and aggregate all the result 
> together for users.
> This also can be used to analys the bottleneck in PySpark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3474) The env variable SPARK_MASTER_IP does not work

2014-09-10 Thread Chunjun Xiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunjun Xiao updated SPARK-3474:

Summary: The env variable SPARK_MASTER_IP does not work  (was: Rename the 
env variable SPARK_MASTER_IP to SPARK_MASTER_HOST)

> The env variable SPARK_MASTER_IP does not work
> --
>
> Key: SPARK-3474
> URL: https://issues.apache.org/jira/browse/SPARK-3474
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.1
>Reporter: Chunjun Xiao
>
> There's some inconsistency regarding the env variable used to specify the 
> spark master host server.
> In spark source code (MasterArguments.scala), the env variable is 
> "SPARK_MASTER_HOST", while in the shell script (e.g., spark-env.sh, 
> start-master.sh), it's named "SPARK_MASTER_IP".
> This will introduce an issue in some case, e.g., if spark master is started  
> via "service spark-master start", which is built based on latest bigtop 
> (refer to bigtop/spark-master.svc).
> In this case, "SPARK_MASTER_IP" will have no effect.
> I suggest we change SPARK_MASTER_IP in the shell script to SPARK_MASTER_HOST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3479) Have Jenkins show which tests failed in GitHub message

2014-09-10 Thread Nicholas Chammas (JIRA)

Nicholas Chammas created SPARK-3479:
---

 Summary: Have Jenkins show which tests failed in GitHub message
 Key: SPARK-3479
 URL: https://issues.apache.org/jira/browse/SPARK-3479
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Reporter: Nicholas Chammas
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3478) Profile Python tasks stage by stage in worker

2014-09-10 Thread Davies Liu (JIRA)

Davies Liu created SPARK-3478:
-

 Summary: Profile Python tasks stage by stage in worker
 Key: SPARK-3478
 URL: https://issues.apache.org/jira/browse/SPARK-3478
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Reporter: Davies Liu


The Python code in driver is easy to profile by users, but the code run in 
worker is distributed in clusters, is not easy to profile by users.

So we need a way to do the profiling in worker and aggregate all the result 
together for users.

This also can be used to analys the bottleneck in PySpark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3477) Clean up code in Yarn Client / ClientBase

2014-09-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129452#comment-14129452
 ] 

Apache Spark commented on SPARK-3477:
-

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/2350

> Clean up code in Yarn Client / ClientBase
> -
>
> Key: SPARK-3477
> URL: https://issues.apache.org/jira/browse/SPARK-3477
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.1.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> The existing code is difficult to read, has too many random layers of 
> indirection, breaks most style guide Spark has ever introduced, and is 
> duplicated in many places.
> We should clean it up!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3129) Prevent data loss in Spark Streaming

2014-09-10 Thread Hari Shreedharan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129279#comment-14129279
 ] 

Hari Shreedharan commented on SPARK-3129:
-

[~sowen] Thanks! That fixed the issue! That saved me a whole lot of time! 

> Prevent data loss in Spark Streaming
> 
>
> Key: SPARK-3129
> URL: https://issues.apache.org/jira/browse/SPARK-3129
> Project: Spark
>  Issue Type: New Feature
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: SecurityFix.diff, StreamingPreventDataLoss.pdf
>
>
> Spark Streaming can small amounts of data when the driver goes down - and the 
> sending system cannot re-send the data (or the data has already expired on 
> the sender side). The document attached has more details. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3477) Clean up code in Yarn Client / ClientBase

2014-09-10 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-3477:
-
Issue Type: Improvement  (was: Bug)

> Clean up code in Yarn Client / ClientBase
> -
>
> Key: SPARK-3477
> URL: https://issues.apache.org/jira/browse/SPARK-3477
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.1.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> The existing code is difficult to read, has too many random layers of 
> indirection, breaks most style guide Spark has ever introduced, and is 
> duplicated in many places.
> We should clean it up!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3477) Clean up code in Yarn Client / ClientBase

2014-09-10 Thread Andrew Or (JIRA)

Andrew Or created SPARK-3477:


 Summary: Clean up code in Yarn Client / ClientBase
 Key: SPARK-3477
 URL: https://issues.apache.org/jira/browse/SPARK-3477
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.1.0
Reporter: Andrew Or
Assignee: Andrew Or


The existing code is difficult to read, has too many random layers of 
indirection, breaks most style guide Spark has ever introduced, and is 
duplicated in many places.

We should clean it up!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3272) Calculate prediction for nodes separately from calculating information gain for splits in decision tree

2014-09-10 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-3272.
--
  Resolution: Fixed
   Fix Version/s: (was: 1.1.0)
  1.2.0
Target Version/s: 1.2.0  (was: 1.0.2)

> Calculate prediction for nodes separately from calculating information gain 
> for splits in decision tree
> ---
>
> Key: SPARK-3272
> URL: https://issues.apache.org/jira/browse/SPARK-3272
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.0.2
>Reporter: Qiping Li
>Assignee: Qiping Li
> Fix For: 1.2.0
>
>
> In current implementation, prediction for a node is calculated along with 
> calculation of information gain stats for each possible splits. The value to 
> predict for a specific node is determined, no matter what the splits are.
> To save computation, we can first calculate prediction first and then 
> calculate information gain stats for each split.
> This is also necessary if we want to support minimum instances per node 
> parameters([SPARK-2207|https://issues.apache.org/jira/browse/SPARK-2207]) 
> because when all splits don't satisfy minimum instances requirement , we 
> don't use information gain of any splits. There should be a way to get the 
> prediction value.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3272) Calculate prediction for nodes separately from calculating information gain for splits in decision tree

2014-09-10 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-3272:
-
Assignee: Qiping Li

> Calculate prediction for nodes separately from calculating information gain 
> for splits in decision tree
> ---
>
> Key: SPARK-3272
> URL: https://issues.apache.org/jira/browse/SPARK-3272
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.0.2
>Reporter: Qiping Li
>Assignee: Qiping Li
> Fix For: 1.2.0
>
>
> In current implementation, prediction for a node is calculated along with 
> calculation of information gain stats for each possible splits. The value to 
> predict for a specific node is determined, no matter what the splits are.
> To save computation, we can first calculate prediction first and then 
> calculate information gain stats for each split.
> This is also necessary if we want to support minimum instances per node 
> parameters([SPARK-2207|https://issues.apache.org/jira/browse/SPARK-2207]) 
> because when all splits don't satisfy minimum instances requirement , we 
> don't use information gain of any splits. There should be a way to get the 
> prediction value.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2207) Add minimum information gain and minimum instances per node as training parameters for decision tree.

2014-09-10 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-2207.
--
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2332
[https://github.com/apache/spark/pull/2332]

> Add minimum information gain and minimum instances per node as training 
> parameters for decision tree.
> -
>
> Key: SPARK-2207
> URL: https://issues.apache.org/jira/browse/SPARK-2207
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 1.0.0
>Reporter: Manish Amde
>Assignee: Qiping Li
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3129) Prevent data loss in Spark Streaming

2014-09-10 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129256#comment-14129256
 ] 

Sean Owen commented on SPARK-3129:
--

[~hshreedharan] Just manually add the src dir in the parent to the module in 
IntelliJ. It'd be cooler if it was automatic, but not hard. There have been 
fixes proposed but I assume this is likely to go away as a problem only when 
yarn-alpha goes away.

> Prevent data loss in Spark Streaming
> 
>
> Key: SPARK-3129
> URL: https://issues.apache.org/jira/browse/SPARK-3129
> Project: Spark
>  Issue Type: New Feature
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: SecurityFix.diff, StreamingPreventDataLoss.pdf
>
>
> Spark Streaming can small amounts of data when the driver goes down - and the 
> sending system cannot re-send the data (or the data has already expired on 
> the sender side). The document attached has more details. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3129) Prevent data loss in Spark Streaming

2014-09-10 Thread Hari Shreedharan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129252#comment-14129252
 ] 

Hari Shreedharan commented on SPARK-3129:
-

FYI here is the branch where I am doing development on this: 
https://github.com/harishreedharan/spark/tree/streaming-ha

Off topic, in Intellij, is there a way to get the yarn/stable stuff to 
recognize their base classes in common so we can get autocomplete and syntax 
highlighting (even type awareness) to work properly?

> Prevent data loss in Spark Streaming
> 
>
> Key: SPARK-3129
> URL: https://issues.apache.org/jira/browse/SPARK-3129
> Project: Spark
>  Issue Type: New Feature
>Reporter: Hari Shreedharan
>Assignee: Hari Shreedharan
> Attachments: SecurityFix.diff, StreamingPreventDataLoss.pdf
>
>
> Spark Streaming can small amounts of data when the driver goes down - and the 
> sending system cannot re-send the data (or the data has already expired on 
> the sender side). The document attached has more details. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2445) MesosExecutorBackend crashes in fine grained mode

2014-09-10 Thread Tomas Barton (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129226#comment-14129226
 ] 

Tomas Barton commented on SPARK-2445:
-

well, the workaround doesn't seem to work, here's output with TRACE log level: 
(produced by running same job:
{code}
MASTER=mesos://`cat /etc/mesos/zk` ./bin/run-example SparkLR
{code}
 from different node)
{code}
...
On iteration 5
14/09/11 00:02:45 INFO SparkContext: Starting job: reduce at SparkLR.scala:64
14/09/11 00:02:45 TRACE DAGScheduler: Checking for newly runnable parent stages
14/09/11 00:02:45 TRACE DAGScheduler: running: Set()
14/09/11 00:02:45 TRACE DAGScheduler: waiting: Set()
14/09/11 00:02:45 TRACE DAGScheduler: failed: Set()
14/09/11 00:02:45 INFO DAGScheduler: Got job 4 (reduce at SparkLR.scala:64) 
with 2 output partitions (allowLocal=false)
14/09/11 00:02:45 INFO DAGScheduler: Final stage: Stage 4(reduce at 
SparkLR.scala:64)
14/09/11 00:02:45 INFO DAGScheduler: Parents of final stage: List()
14/09/11 00:02:45 INFO DAGScheduler: Missing parents: List()
14/09/11 00:02:45 DEBUG DAGScheduler: submitStage(Stage 4)
14/09/11 00:02:45 DEBUG DAGScheduler: missing: List()
14/09/11 00:02:45 INFO DAGScheduler: Submitting Stage 4 (MappedRDD[5] at map at 
SparkLR.scala:62), which has no missing parents
14/09/11 00:02:45 DEBUG DAGScheduler: submitMissingTasks(Stage 4)
14/09/11 00:02:45 INFO DAGScheduler: Submitting 2 missing tasks from Stage 4 
(MappedRDD[5] at map at SparkLR.scala:62)
14/09/11 00:02:45 DEBUG DAGScheduler: New pending tasks: Set(ResultTask(4, 0), 
ResultTask(4, 1))
14/09/11 00:02:45 INFO TaskSchedulerImpl: Adding task set 4.0 with 2 tasks
14/09/11 00:02:45 DEBUG TaskSetManager: Epoch for TaskSet 4.0: 5
14/09/11 00:02:45 DEBUG TaskSetManager: Valid locality levels for TaskSet 4.0: 
PROCESS_LOCAL, NODE_LOCAL, ANY
14/09/11 00:02:45 TRACE DAGScheduler: Checking for newly runnable parent stages
14/09/11 00:02:45 TRACE DAGScheduler: running: Set(Stage 4)
14/09/11 00:02:45 TRACE DAGScheduler: waiting: Set()
14/09/11 00:02:45 TRACE DAGScheduler: failed: Set()
14/09/11 00:02:45 DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_4, 
runningTasks: 0
14/09/11 00:02:45 INFO TaskSetManager: Starting task 4.0:1 as TID 15 on 
executor 20140910-231511-185277356-5050-425-101: 172.27.11.11 (PROCESS_LOCAL)
14/09/11 00:02:45 INFO TaskSetManager: Serialized task 4.0:1 as 667083 bytes in 
18 ms
14/09/11 00:02:45 TRACE DAGScheduler: Checking for newly runnable parent stages
14/09/11 00:02:45 INFO TaskSetManager: Starting task 4.0:0 as TID 16 on 
executor 20140910-231511-185277356-5050-425-102: 172.27.11.13 (PROCESS_LOCAL)
14/09/11 00:02:45 TRACE DAGScheduler: running: Set(Stage 4)
14/09/11 00:02:45 TRACE DAGScheduler: waiting: Set()
14/09/11 00:02:45 TRACE DAGScheduler: failed: Set()
14/09/11 00:02:45 INFO TaskSetManager: Serialized task 4.0:0 as 667083 bytes in 
17 ms
14/09/11 00:02:45 TRACE DAGScheduler: Checking for newly runnable parent stages
14/09/11 00:02:45 TRACE DAGScheduler: running: Set(Stage 4)
14/09/11 00:02:45 TRACE DAGScheduler: waiting: Set()
14/09/11 00:02:45 TRACE DAGScheduler: failed: Set()
14/09/11 00:02:46 INFO TaskSetManager: Re-queueing tasks for 
20140910-231511-185277356-5050-425-101 from TaskSet 4.0
14/09/11 00:02:46 WARN TaskSetManager: Lost TID 15 (task 4.0:1)
14/09/11 00:02:46 TRACE DAGScheduler: Checking for newly runnable parent stages
14/09/11 00:02:46 TRACE DAGScheduler: running: Set(Stage 4)
14/09/11 00:02:46 TRACE DAGScheduler: waiting: Set()
14/09/11 00:02:46 TRACE DAGScheduler: failed: Set()
14/09/11 00:02:46 INFO DAGScheduler: Executor lost: 
20140910-231511-185277356-5050-425-101 (epoch 5)
14/09/11 00:02:46 INFO BlockManagerMasterActor: Trying to remove executor 
20140910-231511-185277356-5050-425-101 from BlockManagerMaster.
14/09/11 00:02:46 INFO BlockManagerMaster: Removed 
20140910-231511-185277356-5050-425-101 successfully in removeExecutor
14/09/11 00:02:46 DEBUG MapOutputTrackerMaster: Increasing epoch to 6
14/09/11 00:02:46 TRACE DAGScheduler: Checking for newly runnable parent stages
14/09/11 00:02:46 TRACE DAGScheduler: running: Set(Stage 4)
14/09/11 00:02:46 TRACE DAGScheduler: waiting: Set()
14/09/11 00:02:46 TRACE DAGScheduler: failed: Set()
14/09/11 00:02:46 INFO DAGScheduler: Host added was in lost list earlier: 
172.27.11.11
14/09/11 00:02:46 TRACE DAGScheduler: Checking for newly runnable parent stages
14/09/11 00:02:46 TRACE DAGScheduler: running: Set(Stage 4)
14/09/11 00:02:46 TRACE DAGScheduler: waiting: Set()
14/09/11 00:02:46 DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_4, 
runningTasks: 1
14/09/11 00:02:46 TRACE DAGScheduler: failed: Set()
14/09/11 00:02:46 INFO TaskSetManager: Starting task 4.0:1 as TID 17 on 
executor 20140910-231511-185277356-5050-425-101: 172.27.11.11 (PROCESS_LOCAL)
14/09/11 00:02:46 INFO TaskSetManager: Serialized task 4.0:1 as 667083 bytes in 
15

[jira] [Commented] (SPARK-2445) MesosExecutorBackend crashes in fine grained mode

2014-09-10 Thread Tomas Barton (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129206#comment-14129206
 ] 

Tomas Barton commented on SPARK-2445:
-

still the same issue, this is error was produced on Spark examples, _SparkLR_
{code}
14/09/10 23:52:08 INFO BlockManagerInfo: Registering block manager 
172.27.11.13:51098 with 294.6 MB RAM
14/09/10 23:52:08 INFO BlockManagerInfo: Registering block manager 
172.27.11.11:59588 with 294.6 MB RAM
14/09/10 23:52:09 INFO BlockManagerInfo: Added rdd_0_0 in memory on 
172.27.11.11:59588 (size: 919.1 KB, free: 293.7 MB)
14/09/10 23:52:09 INFO BlockManagerInfo: Added rdd_0_1 in memory on 
172.27.11.11:59588 (size: 919.1 KB, free: 292.8 MB)
14/09/10 23:52:09 INFO BlockManagerInfo: Added rdd_0_0 in memory on 
172.27.11.13:51098 (size: 919.1 KB, free: 293.7 MB)
14/09/10 23:52:10 INFO TaskSetManager: Finished TID 9 in 5233 ms on 
172.27.11.11 (progress: 1/2)
14/09/10 23:52:10 INFO DAGScheduler: Completed ResultTask(2, 1)
14/09/10 23:52:10 INFO DAGScheduler: Completed ResultTask(2, 0)
14/09/10 23:52:10 INFO TaskSetManager: Finished TID 10 in 1958 ms on 
172.27.11.11 (progress: 2/2)
14/09/10 23:52:10 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have 
all completed, from pool 
14/09/10 23:52:10 INFO DAGScheduler: Stage 2 (reduce at SparkLR.scala:64) 
finished in 6.055 s
14/09/10 23:52:10 INFO SparkContext: Job finished: reduce at SparkLR.scala:64, 
took 6.079637186 s
On iteration 4
14/09/10 23:52:10 INFO SparkContext: Starting job: reduce at SparkLR.scala:64
14/09/10 23:52:10 INFO DAGScheduler: Got job 3 (reduce at SparkLR.scala:64) 
with 2 output partitions (allowLocal=false)
14/09/10 23:52:10 INFO DAGScheduler: Final stage: Stage 3(reduce at 
SparkLR.scala:64)
14/09/10 23:52:10 INFO DAGScheduler: Parents of final stage: List()
14/09/10 23:52:10 INFO DAGScheduler: Missing parents: List()
14/09/10 23:52:10 INFO DAGScheduler: Submitting Stage 3 (MappedRDD[4] at map at 
SparkLR.scala:62), which has no missing parents
14/09/10 23:52:10 INFO DAGScheduler: Submitting 2 missing tasks from Stage 3 
(MappedRDD[4] at map at SparkLR.scala:62)
14/09/10 23:52:10 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
14/09/10 23:52:10 INFO TaskSetManager: Starting task 3.0:0 as TID 11 on 
executor 20140910-231511-185277356-5050-425-101: 172.27.11.11 (PROCESS_LOCAL)
14/09/10 23:52:10 INFO TaskSetManager: Serialized task 3.0:0 as 667088 bytes in 
26 ms
14/09/10 23:52:10 INFO TaskSetManager: Starting task 3.0:1 as TID 12 on 
executor 20140910-231511-185277356-5050-425-101: 172.27.11.11 (PROCESS_LOCAL)
14/09/10 23:52:10 INFO TaskSetManager: Serialized task 3.0:1 as 667088 bytes in 
24 ms
14/09/10 23:52:10 INFO TaskSetManager: Re-queueing tasks for 
20140910-231511-185277356-5050-425-101 from TaskSet 3.0
14/09/10 23:52:10 WARN TaskSetManager: Lost TID 11 (task 3.0:0)
14/09/10 23:52:10 WARN TaskSetManager: Lost TID 12 (task 3.0:1)
14/09/10 23:52:10 INFO DAGScheduler: Executor lost: 
20140910-231511-185277356-5050-425-101 (epoch 4)
14/09/10 23:52:10 INFO BlockManagerMasterActor: Trying to remove executor 
20140910-231511-185277356-5050-425-101 from BlockManagerMaster.
14/09/10 23:52:10 INFO BlockManagerMaster: Removed 
20140910-231511-185277356-5050-425-101 successfully in removeExecutor
14/09/10 23:52:10 INFO DAGScheduler: Host added was in lost list earlier: 
172.27.11.11
14/09/10 23:52:10 INFO TaskSetManager: Starting task 3.0:1 as TID 13 on 
executor 20140910-231511-185277356-5050-425-102: 172.27.11.13 (PROCESS_LOCAL)
14/09/10 23:52:10 INFO TaskSetManager: Serialized task 3.0:1 as 667088 bytes in 
9 ms
14/09/10 23:52:14 INFO TaskSetManager: Starting task 3.0:0 as TID 14 on 
executor 20140910-231511-185277356-5050-425-101: 172.27.11.11 (NODE_LOCAL)
14/09/10 23:52:14 INFO TaskSetManager: Serialized task 3.0:0 as 667088 bytes in 
14 ms
14/09/10 23:52:14 ERROR BlockManagerMasterActor: Got two different block 
manager registrations on 20140910-231511-185277356-5050-425-102
{code}

a workaround is to switch to coarse grained mode:

{code}
export SPARK_DAEMON_JAVA_OPTS="-Dspark.mesos.coarse=true"
{code}


> MesosExecutorBackend crashes in fine grained mode
> -
>
> Key: SPARK-2445
> URL: https://issues.apache.org/jira/browse/SPARK-2445
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.0.0
>Reporter: Dario Rexin
>
> When multiple instances of the MesosExecutorBackend are running on the same 
> slave, they will have the same executorId assigned (equal to the mesos 
> slaveId), but will have a different port (which is randomly assigned). 
> Because of this, it can not register a new BlockManager, because one is 
> already registered with the same executorId, but a different

[jira] [Commented] (SPARK-3476) Yarn ClientBase.validateArgs memory checks wrong

2014-09-10 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129026#comment-14129026
 ] 

Thomas Graves commented on SPARK-3476:
--

note https://github.com/apache/spark/pull/2253 fixes up the am memory 
calculations.

> Yarn ClientBase.validateArgs memory checks wrong
> 
>
> Key: SPARK-3476
> URL: https://issues.apache.org/jira/browse/SPARK-3476
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.2.0
>Reporter: Thomas Graves
>
> The yarn ClientBase.validateArgs  memory checks are no longer valid.  It used 
> to be that the overhead was taken out of what the user specified, now we add 
> it on top of what the user specifies.   We can probably just remove these. 
> (args.amMemory <= memoryOverhead) -> ("Error: AM memory size must be" +
> "greater than: " + memoryOverhead),
>   (args.executorMemory <= memoryOverhead) -> ("Error: Executor memory 
> size" +
> "must be greater than: " + memoryOverhead.toString)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3476) Yarn ClientBase.validateArgs memory checks wrong

2014-09-10 Thread Thomas Graves (JIRA)

Thomas Graves created SPARK-3476:


 Summary: Yarn ClientBase.validateArgs memory checks wrong
 Key: SPARK-3476
 URL: https://issues.apache.org/jira/browse/SPARK-3476
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Thomas Graves


The yarn ClientBase.validateArgs  memory checks are no longer valid.  It used 
to be that the overhead was taken out of what the user specified, now we add it 
on top of what the user specifies.   We can probably just remove these. 

(args.amMemory <= memoryOverhead) -> ("Error: AM memory size must be" +
"greater than: " + memoryOverhead),
  (args.executorMemory <= memoryOverhead) -> ("Error: Executor memory size" 
+
"must be greater than: " + memoryOverhead.toString)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-3411) Improve load-balancing of concurrently-submitted drivers across workers

2014-09-10 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reopened SPARK-3411:
--

> Improve load-balancing of concurrently-submitted drivers across workers
> ---
>
> Key: SPARK-3411
> URL: https://issues.apache.org/jira/browse/SPARK-3411
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.1.0
>Reporter: WangTaoTheTonic
>Assignee: WangTaoTheTonic
>Priority: Minor
> Fix For: 1.2.0
>
>
> If the waiting driver array is too big, the drivers in it will be dispatched 
> to the first worker we get(if it has enough resources), with or without the 
> Randomization.
> We should do randomization every time we dispatch a driver, in order to 
> better balance drivers.
> Update(2014/9/6):Doing shuffle is much slower, so we use round robin to avoid 
> it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-3411) Improve load-balancing of concurrently-submitted drivers across workers

2014-09-10 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-3411.

Resolution: Fixed

Re-opened and re-closed to edit a field... please disregard.

> Improve load-balancing of concurrently-submitted drivers across workers
> ---
>
> Key: SPARK-3411
> URL: https://issues.apache.org/jira/browse/SPARK-3411
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.1.0
>Reporter: WangTaoTheTonic
>Assignee: WangTaoTheTonic
>Priority: Minor
> Fix For: 1.2.0
>
>
> If the waiting driver array is too big, the drivers in it will be dispatched 
> to the first worker we get(if it has enough resources), with or without the 
> Randomization.
> We should do randomization every time we dispatch a driver, in order to 
> better balance drivers.
> Update(2014/9/6):Doing shuffle is much slower, so we use round robin to avoid 
> it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3411) Improve load-balancing of concurrently-submitted drivers across workers

2014-09-10 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-3411:
-
Affects Version/s: 1.1.0

> Improve load-balancing of concurrently-submitted drivers across workers
> ---
>
> Key: SPARK-3411
> URL: https://issues.apache.org/jira/browse/SPARK-3411
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.1.0
>Reporter: WangTaoTheTonic
>Assignee: WangTaoTheTonic
>Priority: Minor
> Fix For: 1.2.0
>
>
> If the waiting driver array is too big, the drivers in it will be dispatched 
> to the first worker we get(if it has enough resources), with or without the 
> Randomization.
> We should do randomization every time we dispatch a driver, in order to 
> better balance drivers.
> Update(2014/9/6):Doing shuffle is much slower, so we use round robin to avoid 
> it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-3411) Improve load-balancing of concurrently-submitted drivers across workers

2014-09-10 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-3411.

  Resolution: Fixed
   Fix Version/s: 1.2.0
Assignee: WangTaoTheTonic
Target Version/s: 1.2.0

Fixed by https://github.com/apache/spark/pull/1106

> Improve load-balancing of concurrently-submitted drivers across workers
> ---
>
> Key: SPARK-3411
> URL: https://issues.apache.org/jira/browse/SPARK-3411
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Reporter: WangTaoTheTonic
>Assignee: WangTaoTheTonic
>Priority: Minor
> Fix For: 1.2.0
>
>
> If the waiting driver array is too big, the drivers in it will be dispatched 
> to the first worker we get(if it has enough resources), with or without the 
> Randomization.
> We should do randomization every time we dispatch a driver, in order to 
> better balance drivers.
> Update(2014/9/6):Doing shuffle is much slower, so we use round robin to avoid 
> it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2096) Correctly parse dot notations for accessing an array of structs

2014-09-10 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2096.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

I'm going to mark this as fixed as I think the parsing issues are fixed.  If we 
want to add support for calling .fieldName of arrays of structs we should open 
another PR.

> Correctly parse dot notations for accessing an array of structs
> ---
>
> Key: SPARK-2096
> URL: https://issues.apache.org/jira/browse/SPARK-2096
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.0
>Reporter: Yin Huai
>Priority: Minor
>  Labels: starter
> Fix For: 1.2.0
>
>
> For example, "arrayOfStruct" is an array of structs and every element of this 
> array has a field called "field1". "arrayOfStruct[0].field1" means to access 
> the value of "field1" for the first element of "arrayOfStruct", but the SQL 
> parser (in sql-core) treats "field1" as an alias. Also, 
> "arrayOfStruct.field1" means to access all values of "field1" in this array 
> of structs and the returns those values as an array. But, the SQL parser 
> cannot resolve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3475) dev/merge_spark_pr.py fails on mac

2014-09-10 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-3475.
--
Resolution: Invalid

worked fine on another pr so not sure what happened on this one.  I'll look 
into it more if it happens again.

> dev/merge_spark_pr.py fails on mac 
> ---
>
> Key: SPARK-3475
> URL: https://issues.apache.org/jira/browse/SPARK-3475
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.2.0
>Reporter: Thomas Graves
>
> commit 
> https://github.com/apache/spark/commit/4f4a9884d9268ba9808744b3d612ac23c75f105a#diff-c321b6c82ebb21d8fd225abea9b7b74c
>  adding in print statement in the run command. When I try to run on mac it 
> errors out when it hits these print statements. Perhaps there is workaround 
> of issue with my environment.
> Automatic merge went well; stopped before committing as requested
> git log HEAD..PR_TOOL_MERGE_PR_2276 --pretty=format:%an <%ae>
> git log HEAD..PR_TOOL_MERGE_PR_2276 --pretty=format:%h [%an] %s
> Traceback (most recent call last):
>   File "./dev/merge_spark_pr.py", line 332, in 
> merge_hash = merge_pr(pr_num, target_ref)
>   File "./dev/merge_spark_pr.py", line 156, in merge_pr
> run_cmd(['git', 'commit', '--author="%s"' % primary_author] + 
> merge_message_flags)
>   File "./dev/merge_spark_pr.py", line 77, in run_cmd
> print " ".join(cmd)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-1713) YarnAllocationHandler starts a thread for every executor it runs

2014-09-10 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-1713.
--
   Resolution: Fixed
Fix Version/s: 1.2.0

> YarnAllocationHandler starts a thread for every executor it runs
> 
>
> Key: SPARK-1713
> URL: https://issues.apache.org/jira/browse/SPARK-1713
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-1484) MLlib should warn if you are using an iterative algorithm on non-cached data

2014-09-10 Thread Aaron Staple (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Staple updated SPARK-1484:

Comment: was deleted

(was: https://github.com/apache/spark/pull/2347)

> MLlib should warn if you are using an iterative algorithm on non-cached data
> 
>
> Key: SPARK-1484
> URL: https://issues.apache.org/jira/browse/SPARK-1484
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Matei Zaharia
>
> Not sure what the best way to warn is, but even printing to the log is 
> probably fine. We may want to print at the end of the training run as well as 
> the beginning to make it more visible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1484) MLlib should warn if you are using an iterative algorithm on non-cached data

2014-09-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128800#comment-14128800
 ] 

Apache Spark commented on SPARK-1484:
-

User 'staple' has created a pull request for this issue:
https://github.com/apache/spark/pull/2347

> MLlib should warn if you are using an iterative algorithm on non-cached data
> 
>
> Key: SPARK-1484
> URL: https://issues.apache.org/jira/browse/SPARK-1484
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Matei Zaharia
>
> Not sure what the best way to warn is, but even printing to the log is 
> probably fine. We may want to print at the end of the training run as well as 
> the beginning to make it more visible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3363) [SQL] Type Coercion should support every type to have null value

2014-09-10 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3363.
-
   Resolution: Fixed
Fix Version/s: 1.2.0
 Assignee: Adrian Wang

> [SQL] Type Coercion should support every type to have null value
> 
>
> Key: SPARK-3363
> URL: https://issues.apache.org/jira/browse/SPARK-3363
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Adrian Wang
>Assignee: Adrian Wang
> Fix For: 1.2.0
>
>
> Current implementation only support numeric(ByteType, ShortType, IntegerType, 
> LongType, FloatType, DoubleType, DecimalType) and boolean



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1484) MLlib should warn if you are using an iterative algorithm on non-cached data

2014-09-10 Thread Aaron Staple (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128798#comment-14128798
 ] 

Aaron Staple commented on SPARK-1484:
-

https://github.com/apache/spark/pull/2347

> MLlib should warn if you are using an iterative algorithm on non-cached data
> 
>
> Key: SPARK-1484
> URL: https://issues.apache.org/jira/browse/SPARK-1484
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Matei Zaharia
>
> Not sure what the best way to warn is, but even printing to the log is 
> probably fine. We may want to print at the end of the training run as well as 
> the beginning to make it more visible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3362) [SQL] bug in CaseWhen resolve

2014-09-10 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3362.
-
   Resolution: Fixed
Fix Version/s: 1.2.0
 Assignee: Adrian Wang

> [SQL] bug in CaseWhen resolve
> -
>
> Key: SPARK-3362
> URL: https://issues.apache.org/jira/browse/SPARK-3362
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Adrian Wang
>Assignee: Adrian Wang
> Fix For: 1.2.0
>
>
> select case x when 0 then null else y/x end from t;
> will lead to an match error in toHiveString() when output, because spark 
> would consider the output is always NullType, which is not right.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3377) Don't mix metrics from different applications otherwise we cannot distinguish

2014-09-10 Thread Kousuke Saruta (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-3377:
--
Description: 
I'm using codahale base MetricsSystem of Spark with JMX or Graphite, and I saw 
following 2 problems.

(1) When applications which have same spark.app.name run on cluster at the same 
time, some metrics names are mixed. For instance, if 2+ application is running 
on the cluster at the same time, each application emits the same named metric 
like  "SparkPi.DAGScheduler.stage.failedStages" and Graphite cannot distinguish 
the metrics is for which application.

(2) When 2+ executors run on the same machine, JVM metrics of each executors 
are mixed. For instance, 2+ executors running on the same node can emit the 
same named metric "jvm.memory" and Graphite cannot distinguish the metrics is 
from which application.

  was:
I'm using codahale base MetricsSystem of Spark with JMX or Graphite, and I saw 
following 2 problems.

(1) When applications which have same spark.app.name run on cluster at the same 
time, some metrics names jumble up together. e.g, 
SparkPi.DAGScheduler.stage.failedStages jumble.

(2) When 2+ executors run on the same machine, JVM metrics of each executors 
jumble. e.g, We current implementation cannot distinguish metric "jvm.memory" 
is for which executor.


> Don't mix metrics from different applications otherwise we cannot distinguish
> -
>
> Key: SPARK-3377
> URL: https://issues.apache.org/jira/browse/SPARK-3377
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Kousuke Saruta
>Priority: Critical
>
> I'm using codahale base MetricsSystem of Spark with JMX or Graphite, and I 
> saw following 2 problems.
> (1) When applications which have same spark.app.name run on cluster at the 
> same time, some metrics names are mixed. For instance, if 2+ application is 
> running on the cluster at the same time, each application emits the same 
> named metric like  "SparkPi.DAGScheduler.stage.failedStages" and Graphite 
> cannot distinguish the metrics is for which application.
> (2) When 2+ executors run on the same machine, JVM metrics of each executors 
> are mixed. For instance, 2+ executors running on the same node can emit the 
> same named metric "jvm.memory" and Graphite cannot distinguish the metrics is 
> from which application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

2014-09-10 Thread Matthew Farrellee (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128778#comment-14128778
 ] 

Matthew Farrellee commented on SPARK-3470:
--

i stand corrected.

http://spark.apache.org/docs/1.0.2/ does say "Spark runs on Java 6+ and Python 
2.6+. For the Scala API, Spark 1.0.2 uses Scala 2.10. You will need to use a 
compatible Scala version (2.10.x)."

since java 6 is EOL, i hope spark will become java 7+ soon.

> Have JavaSparkContext implement Closeable/AutoCloseable
> ---
>
> Key: SPARK-3470
> URL: https://issues.apache.org/jira/browse/SPARK-3470
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.0.2
>Reporter: Shay Rojansky
>Priority: Minor
>
> After discussion in SPARK-2972, it seems like a good idea to allow Java 
> developers to use Java 7 automatic resource management with JavaSparkContext, 
> like so:
> {code:java}
> try (JavaSparkContext ctx = new JavaSparkContext(...)) {
>return br.readLine();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

2014-09-10 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128764#comment-14128764
 ] 

Sean Owen commented on SPARK-3470:
--

Spark retains compatibility with Java 6 on purpose AFAIK. But implementing 
Closeable is fine and also works with try-with-resources in Java 7, yes.

> Have JavaSparkContext implement Closeable/AutoCloseable
> ---
>
> Key: SPARK-3470
> URL: https://issues.apache.org/jira/browse/SPARK-3470
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.0.2
>Reporter: Shay Rojansky
>Priority: Minor
>
> After discussion in SPARK-2972, it seems like a good idea to allow Java 
> developers to use Java 7 automatic resource management with JavaSparkContext, 
> like so:
> {code:java}
> try (JavaSparkContext ctx = new JavaSparkContext(...)) {
>return br.readLine();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

2014-09-10 Thread Matthew Farrellee (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128751#comment-14128751
 ] 

Matthew Farrellee commented on SPARK-3470:
--

while you can implement Closeable in java 7+ and use try (Closeable c = new 
...) { ... } (at least w/ openjdk 1.8), since spark targets java 7+, why not 
just use AutoCloseable?

> Have JavaSparkContext implement Closeable/AutoCloseable
> ---
>
> Key: SPARK-3470
> URL: https://issues.apache.org/jira/browse/SPARK-3470
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.0.2
>Reporter: Shay Rojansky
>Priority: Minor
>
> After discussion in SPARK-2972, it seems like a good idea to allow Java 
> developers to use Java 7 automatic resource management with JavaSparkContext, 
> like so:
> {code:java}
> try (JavaSparkContext ctx = new JavaSparkContext(...)) {
>return br.readLine();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3475) dev/merge_spark_pr.py fails on mac

2014-09-10 Thread Thomas Graves (JIRA)

Thomas Graves created SPARK-3475:


 Summary: dev/merge_spark_pr.py fails on mac 
 Key: SPARK-3475
 URL: https://issues.apache.org/jira/browse/SPARK-3475
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.2.0
Reporter: Thomas Graves


commit 
https://github.com/apache/spark/commit/4f4a9884d9268ba9808744b3d612ac23c75f105a#diff-c321b6c82ebb21d8fd225abea9b7b74c
 adding in print statement in the run command. When I try to run on mac it 
errors out when it hits these print statements. Perhaps there is workaround of 
issue with my environment.

Automatic merge went well; stopped before committing as requested
git log HEAD..PR_TOOL_MERGE_PR_2276 --pretty=format:%an <%ae>
git log HEAD..PR_TOOL_MERGE_PR_2276 --pretty=format:%h [%an] %s
Traceback (most recent call last):
  File "./dev/merge_spark_pr.py", line 332, in 
merge_hash = merge_pr(pr_num, target_ref)
  File "./dev/merge_spark_pr.py", line 156, in merge_pr
run_cmd(['git', 'commit', '--author="%s"' % primary_author] + 
merge_message_flags)
  File "./dev/merge_spark_pr.py", line 77, in run_cmd
print " ".join(cmd)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3286) Cannot view ApplicationMaster UI when Yarn’s url scheme is https

2014-09-10 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-3286.
--
   Resolution: Fixed
Fix Version/s: 1.2.0

> Cannot view ApplicationMaster UI when Yarn’s url scheme is https
> 
>
> Key: SPARK-3286
> URL: https://issues.apache.org/jira/browse/SPARK-3286
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI, YARN
>Affects Versions: 1.0.2
>Reporter: Benoy Antony
> Fix For: 1.2.0
>
> Attachments: SPARK-3286-branch-1-0.patch, SPARK-3286.patch
>
>
> The spark Application Master starts its web UI at http://:port.
> When Spark ApplicationMaster registers its URL with Resource Manager , the 
> URL does not contain URI scheme.
> If the URL scheme is absent, Resource Manager’s web app proxy will use the 
> HTTP Policy of the Resource Manager.(YARN-1553)
> If the HTTP Policy of the Resource Manager is https, then web app proxy  will 
> try to access https://:port.
> This will result in error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3462) parquet pushdown for unionAll

2014-09-10 Thread Cody Koeninger (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128682#comment-14128682
 ] 

Cody Koeninger commented on SPARK-3462:
---

Tested this on a cluster against unions of 2 and 3 parquet tables, around 
2billion records.  

Seems like a big performance win - previously, simple queries (eg count, approx 
distinct count of single column) against a union of 2 tables were taking 5 to 
10x as long as a single table.  

Now it's closer to linear, e.g. 35 secs for one table, 74 for union of 2, etc.

> parquet pushdown for unionAll
> -
>
> Key: SPARK-3462
> URL: https://issues.apache.org/jira/browse/SPARK-3462
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Cody Koeninger
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/parquet-predicate-projection-pushdown-into-unionAll-td8339.html
> // single table, pushdown
> scala> p.where('age < 40).select('name)
> res36: org.apache.spark.sql.SchemaRDD =
> SchemaRDD[97] at RDD at SchemaRDD.scala:103
> == Query Plan ==
> == Physical Plan ==
> Project [name#3]
>  ParquetTableScan [name#3,age#4], (ParquetRelation /var/tmp/people, 
> Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
> mapred-site.xml), org.apache.spark.sql.SQLContext@6d7e79f6, []), [(age#4 < 
> 40)]
> // union of 2 tables, no pushdown
> scala> b.where('age < 40).select('name)
> res37: org.apache.spark.sql.SchemaRDD =
> SchemaRDD[99] at RDD at SchemaRDD.scala:103
> == Query Plan ==
> == Physical Plan ==
> Project [name#3]
>  Filter (age#4 < 40)
>   Union [ParquetTableScan [name#3,age#4,phones#5], (ParquetRelation 
> /var/tmp/people, Some(Configuration: core-default.xml, core-site.xml, 
> mapred-default.xml, mapred-site.xml), 
> org.apache.spark.sql.SQLContext@6d7e79f6, []), []
> ,ParquetTableScan [name#0,age#1,phones#2], (ParquetRelation /var/tmp/people2, 
> Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
> mapred-site.xml), org.apache.spark.sql.SQLContext@6d7e79f6, []), []
> ]  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

2014-09-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128552#comment-14128552
 ] 

Apache Spark commented on SPARK-3470:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/2346

> Have JavaSparkContext implement Closeable/AutoCloseable
> ---
>
> Key: SPARK-3470
> URL: https://issues.apache.org/jira/browse/SPARK-3470
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.0.2
>Reporter: Shay Rojansky
>Priority: Minor
>
> After discussion in SPARK-2972, it seems like a good idea to allow Java 
> developers to use Java 7 automatic resource management with JavaSparkContext, 
> like so:
> {code:java}
> try (JavaSparkContext ctx = new JavaSparkContext(...)) {
>return br.readLine();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3462) parquet pushdown for unionAll

2014-09-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128478#comment-14128478
 ] 

Apache Spark commented on SPARK-3462:
-

User 'koeninger' has created a pull request for this issue:
https://github.com/apache/spark/pull/2345

> parquet pushdown for unionAll
> -
>
> Key: SPARK-3462
> URL: https://issues.apache.org/jira/browse/SPARK-3462
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Cody Koeninger
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/parquet-predicate-projection-pushdown-into-unionAll-td8339.html
> // single table, pushdown
> scala> p.where('age < 40).select('name)
> res36: org.apache.spark.sql.SchemaRDD =
> SchemaRDD[97] at RDD at SchemaRDD.scala:103
> == Query Plan ==
> == Physical Plan ==
> Project [name#3]
>  ParquetTableScan [name#3,age#4], (ParquetRelation /var/tmp/people, 
> Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
> mapred-site.xml), org.apache.spark.sql.SQLContext@6d7e79f6, []), [(age#4 < 
> 40)]
> // union of 2 tables, no pushdown
> scala> b.where('age < 40).select('name)
> res37: org.apache.spark.sql.SchemaRDD =
> SchemaRDD[99] at RDD at SchemaRDD.scala:103
> == Query Plan ==
> == Physical Plan ==
> Project [name#3]
>  Filter (age#4 < 40)
>   Union [ParquetTableScan [name#3,age#4,phones#5], (ParquetRelation 
> /var/tmp/people, Some(Configuration: core-default.xml, core-site.xml, 
> mapred-default.xml, mapred-site.xml), 
> org.apache.spark.sql.SQLContext@6d7e79f6, []), []
> ,ParquetTableScan [name#0,age#1,phones#2], (ParquetRelation /var/tmp/people2, 
> Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
> mapred-site.xml), org.apache.spark.sql.SQLContext@6d7e79f6, []), []
> ]  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3462) parquet pushdown for unionAll

2014-09-10 Thread Cody Koeninger (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128451#comment-14128451
 ] 

Cody Koeninger commented on SPARK-3462:
---

Created a PR for feedback.

https://github.com/apache/spark/pull/2345

Seems to do the right thing locally, will see about testing on a cluster

> parquet pushdown for unionAll
> -
>
> Key: SPARK-3462
> URL: https://issues.apache.org/jira/browse/SPARK-3462
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Cody Koeninger
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/parquet-predicate-projection-pushdown-into-unionAll-td8339.html
> // single table, pushdown
> scala> p.where('age < 40).select('name)
> res36: org.apache.spark.sql.SchemaRDD =
> SchemaRDD[97] at RDD at SchemaRDD.scala:103
> == Query Plan ==
> == Physical Plan ==
> Project [name#3]
>  ParquetTableScan [name#3,age#4], (ParquetRelation /var/tmp/people, 
> Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
> mapred-site.xml), org.apache.spark.sql.SQLContext@6d7e79f6, []), [(age#4 < 
> 40)]
> // union of 2 tables, no pushdown
> scala> b.where('age < 40).select('name)
> res37: org.apache.spark.sql.SchemaRDD =
> SchemaRDD[99] at RDD at SchemaRDD.scala:103
> == Query Plan ==
> == Physical Plan ==
> Project [name#3]
>  Filter (age#4 < 40)
>   Union [ParquetTableScan [name#3,age#4,phones#5], (ParquetRelation 
> /var/tmp/people, Some(Configuration: core-default.xml, core-site.xml, 
> mapred-default.xml, mapred-site.xml), 
> org.apache.spark.sql.SQLContext@6d7e79f6, []), []
> ,ParquetTableScan [name#0,age#1,phones#2], (ParquetRelation /var/tmp/people2, 
> Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, 
> mapred-site.xml), org.apache.spark.sql.SQLContext@6d7e79f6, []), []
> ]  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3407) Add Date type support

2014-09-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128421#comment-14128421
 ] 

Apache Spark commented on SPARK-3407:
-

User 'adrian-wang' has created a pull request for this issue:
https://github.com/apache/spark/pull/2344

> Add Date type support
> -
>
> Key: SPARK-3407
> URL: https://issues.apache.org/jira/browse/SPARK-3407
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Cheng Hao
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-3474) Rename the env variable SPARK_MASTER_IP to SPARK_MASTER_HOST

2014-09-10 Thread Chunjun Xiao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128303#comment-14128303
 ] 

Chunjun Xiao edited comment on SPARK-3474 at 9/10/14 10:09 AM:
---

I agree we should still support old variable names.
The problem is, if the user takes the old variable name (SPARK_MASTER_IP) and 
start spark master like "service spark-master start", the SPARK_MASTER_IP will 
fail to work.
We should fix it, right?


was (Author: chunjun.xiao):
[~srowen]
I agree we should still support old variable names.
The problem is, if the user takes the old variable name (SPARK_MASTER_IP) and 
start spark master like "service spark-master start", the SPARK_MASTER_IP will 
fail to work.
We should fix it, right?

> Rename the env variable SPARK_MASTER_IP to SPARK_MASTER_HOST
> 
>
> Key: SPARK-3474
> URL: https://issues.apache.org/jira/browse/SPARK-3474
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.1
>Reporter: Chunjun Xiao
>
> There's some inconsistency regarding the env variable used to specify the 
> spark master host server.
> In spark source code (MasterArguments.scala), the env variable is 
> "SPARK_MASTER_HOST", while in the shell script (e.g., spark-env.sh, 
> start-master.sh), it's named "SPARK_MASTER_IP".
> This will introduce an issue in some case, e.g., if spark master is started  
> via "service spark-master start", which is built based on latest bigtop 
> (refer to bigtop/spark-master.svc).
> In this case, "SPARK_MASTER_IP" will have no effect.
> I suggest we change SPARK_MASTER_IP in the shell script to SPARK_MASTER_HOST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3474) Rename the env variable SPARK_MASTER_IP to SPARK_MASTER_HOST

2014-09-10 Thread Chunjun Xiao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128303#comment-14128303
 ] 

Chunjun Xiao commented on SPARK-3474:
-

[~srowen]
I agree we should still support old variable names.
The problem is, if the user takes the old variable name (SPARK_MASTER_IP) and 
start spark master like "service spark-master start", the SPARK_MASTER_IP will 
fail to work.
We should fix it, right?

> Rename the env variable SPARK_MASTER_IP to SPARK_MASTER_HOST
> 
>
> Key: SPARK-3474
> URL: https://issues.apache.org/jira/browse/SPARK-3474
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.1
>Reporter: Chunjun Xiao
>
> There's some inconsistency regarding the env variable used to specify the 
> spark master host server.
> In spark source code (MasterArguments.scala), the env variable is 
> "SPARK_MASTER_HOST", while in the shell script (e.g., spark-env.sh, 
> start-master.sh), it's named "SPARK_MASTER_IP".
> This will introduce an issue in some case, e.g., if spark master is started  
> via "service spark-master start", which is built based on latest bigtop 
> (refer to bigtop/spark-master.svc).
> In this case, "SPARK_MASTER_IP" will have no effect.
> I suggest we change SPARK_MASTER_IP in the shell script to SPARK_MASTER_HOST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3474) Rename the env variable SPARK_MASTER_IP to SPARK_MASTER_HOST

2014-09-10 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128300#comment-14128300
 ] 

Sean Owen commented on SPARK-3474:
--

(You can deprecate but still support old variable names, right? so 
SPARK_MASTER_IP has the effect of setting new SPARK_MASTER_HOST but generates a 
warning. You wouldn't want to or need to remove old vars immediately.)

> Rename the env variable SPARK_MASTER_IP to SPARK_MASTER_HOST
> 
>
> Key: SPARK-3474
> URL: https://issues.apache.org/jira/browse/SPARK-3474
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.1
>Reporter: Chunjun Xiao
>
> There's some inconsistency regarding the env variable used to specify the 
> spark master host server.
> In spark source code (MasterArguments.scala), the env variable is 
> "SPARK_MASTER_HOST", while in the shell script (e.g., spark-env.sh, 
> start-master.sh), it's named "SPARK_MASTER_IP".
> This will introduce an issue in some case, e.g., if spark master is started  
> via "service spark-master start", which is built based on latest bigtop 
> (refer to bigtop/spark-master.svc).
> In this case, "SPARK_MASTER_IP" will have no effect.
> I suggest we change SPARK_MASTER_IP in the shell script to SPARK_MASTER_HOST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3474) Rename the env variable SPARK_MASTER_IP to SPARK_MASTER_HOST

2014-09-10 Thread Chunjun Xiao (JIRA)

Chunjun Xiao created SPARK-3474:
---

 Summary: Rename the env variable SPARK_MASTER_IP to 
SPARK_MASTER_HOST
 Key: SPARK-3474
 URL: https://issues.apache.org/jira/browse/SPARK-3474
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.1
Reporter: Chunjun Xiao


There's some inconsistency regarding the env variable used to specify the spark 
master host server.

In spark source code (MasterArguments.scala), the env variable is 
"SPARK_MASTER_HOST", while in the shell script (e.g., spark-env.sh, 
start-master.sh), it's named "SPARK_MASTER_IP".

This will introduce an issue in some case, e.g., if spark master is started  
via "service spark-master start", which is built based on latest bigtop (refer 
to bigtop/spark-master.svc).
In this case, "SPARK_MASTER_IP" will have no effect.
I suggest we change SPARK_MASTER_IP in the shell script to SPARK_MASTER_HOST.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

2014-09-10 Thread Shay Rojansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128285#comment-14128285
 ] 

Shay Rojansky commented on SPARK-3470:
--

Good point about AutoCloseable. Yes, the idea is for Closeable to call stop(). 
I'd submit a PR myself but I don't know any Scala whatsoever...

> Have JavaSparkContext implement Closeable/AutoCloseable
> ---
>
> Key: SPARK-3470
> URL: https://issues.apache.org/jira/browse/SPARK-3470
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.0.2
>Reporter: Shay Rojansky
>Priority: Minor
>
> After discussion in SPARK-2972, it seems like a good idea to allow Java 
> developers to use Java 7 automatic resource management with JavaSparkContext, 
> like so:
> {code:java}
> try (JavaSparkContext ctx = new JavaSparkContext(...)) {
>return br.readLine();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3470) Have JavaSparkContext implement Closeable/AutoCloseable

2014-09-10 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128232#comment-14128232
 ] 

Sean Owen commented on SPARK-3470:
--

If you implement {{AutoCloseable}}, then Spark will not work on Java 6, since 
this class does not exist before Java 7. Implementing {{Closeable}} is fine of 
course. I assume it would just call {{stop()}}

> Have JavaSparkContext implement Closeable/AutoCloseable
> ---
>
> Key: SPARK-3470
> URL: https://issues.apache.org/jira/browse/SPARK-3470
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.0.2
>Reporter: Shay Rojansky
>Priority: Minor
>
> After discussion in SPARK-2972, it seems like a good idea to allow Java 
> developers to use Java 7 automatic resource management with JavaSparkContext, 
> like so:
> {code:java}
> try (JavaSparkContext ctx = new JavaSparkContext(...)) {
>return br.readLine();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-3473) Expose task status when converting TaskInfo into JSON representation

2014-09-10 Thread Kousuke Saruta (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta closed SPARK-3473.
-
Resolution: Won't Fix

I can know task status from "failed" field and "finishTime" so I close this.

> Expose task status when converting TaskInfo into JSON representation
> 
>
> Key: SPARK-3473
> URL: https://issues.apache.org/jira/browse/SPARK-3473
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Kousuke Saruta
>
> When TaskInfo is converted into JSON by JsonProtocol, status is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3473) Expose task status when converting TaskInfo into JSON representation

2014-09-10 Thread Kousuke Saruta (JIRA)

Kousuke Saruta created SPARK-3473:
-

 Summary: Expose task status when converting TaskInfo into JSON 
representation
 Key: SPARK-3473
 URL: https://issues.apache.org/jira/browse/SPARK-3473
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Kousuke Saruta


When TaskInfo is converted into JSON by JsonProtocol, status is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3326) can't access a static variable after init in mapper

2014-09-10 Thread Guoqiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li resolved SPARK-3326.

Resolution: Not a Problem

> can't access a static variable after init in mapper
> ---
>
> Key: SPARK-3326
> URL: https://issues.apache.org/jira/browse/SPARK-3326
> Project: Spark
>  Issue Type: Bug
> Environment: CDH5.1.0
> Spark1.0.0
>Reporter: Gavin Zhang
>
> I wrote a object like:
> object Foo {
>private Bar bar = null
>def init(Bar bar){
>this.bar = bar
>}
>def getSome(){
>bar.someDef()
>}
> }
> In Spark main def, I read some text from HDFS and init this object. And after 
> then calling getSome().
> I was successful with this code:
> sc.textFile(args(0)).take(10).map(println(Foo.getSome()))
> However, when I changed it for write output to HDFS, I found the bar variable 
> in Foo object is null:
> sc.textFile(args(0)).map(line=>Foo.getSome()).saveAsTextFile(args(1))
> WHY?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3345) Do correct parameters for ShuffleFileGroup

2014-09-10 Thread Guoqiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li resolved SPARK-3345.

   Resolution: Fixed
Fix Version/s: (was: 1.1.1)

> Do correct parameters for ShuffleFileGroup
> --
>
> Key: SPARK-3345
> URL: https://issues.apache.org/jira/browse/SPARK-3345
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Minor
> Fix For: 1.2.0
>
>
> In the method newFileGroup of class FileShuffleBlockManager, the parameters 
> for creating new ShuffleFileGroup object is in wrong order.
> Wrong: new ShuffleFileGroup(fileId, shuffleId, files)
> Corrent: new ShuffleFileGroup(shuffleId, fileId, files)
> Because in current codes, the parameters shuffleId and fileId are not used. 
> So it doesn't cause problem now. However it should be corrected for 
> readability and avoid future problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3364) Zip equal-length but unequally-partition

2014-09-10 Thread Guoqiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li resolved SPARK-3364.

Resolution: Fixed

> Zip equal-length but unequally-partition
> 
>
> Key: SPARK-3364
> URL: https://issues.apache.org/jira/browse/SPARK-3364
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.2
>Reporter: Kevin Jung
> Fix For: 1.1.0
>
>
> ZippedRDD losts some elements after zipping RDDs with equal numbers of 
> partitions but unequal numbers of elements in their each partitions.
> This can happen when a user creates RDD by sc.textFile(path,partitionNumbers) 
> with physically unbalanced HDFS file.
> {noformat}
> var x = sc.parallelize(1 to 9,3)
> var y = sc.parallelize(Array(1,1,1,1,1,2,2,3,3),3).keyBy(i=>i)
> var z = y.partitionBy(new RangePartitioner(3,y))
> expected
> x.zip(y).count()
> 9
> x.zip(y).collect()
> Array[(Int, (Int, Int))] = Array((1,(1,1)), (2,(1,1)), (3,(1,1)), (4,(1,1)), 
> (5,(1,1)), (6,(2,2)), (7,(2,2)), (8,(3,3)), (9,(3,3)))
> unexpected
> x.zip(z).count()
> 7
> x.zip(z).collect()
> Array[(Int, (Int, Int))] = Array((1,(1,1)), (2,(1,1)), (3,(1,1)), (4,(2,2)), 
> (5,(2,2)), (7,(3,3)), (8,(3,3)))
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-3472) Option to take top n elements (unsorted)

2014-09-10 Thread Kanwaljit Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kanwaljit Singh closed SPARK-3472.
--
Resolution: Invalid

> Option to take top n elements (unsorted)
> 
>
> Key: SPARK-3472
> URL: https://issues.apache.org/jira/browse/SPARK-3472
> Project: Spark
>  Issue Type: New Feature
>Reporter: Kanwaljit Singh
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

58 matches

Mail list logo