[jira] [Commented] (SPARK-8781) Pusblished POMs are no longer effective POMs

2015-07-01 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611571#comment-14611571
 ] 

Sean Owen commented on SPARK-8781:
--

Does this affect release artifacts or just the snapshot?
That commit doesn't look related since it doesn't touch the lines you reference 
here. Are you sure? 
If it's 'fixed' by changing it is maybe something else at work?

> Pusblished POMs are no longer effective POMs
> 
>
> Key: SPARK-8781
> URL: https://issues.apache.org/jira/browse/SPARK-8781
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.2, 1.4.1, 1.5.0
>Reporter: Konstantin Shaposhnikov
>
> Published to maven repository POMs are no longer effective POMs. E.g. 
> In 
> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-core_2.11/1.4.2-SNAPSHOT/spark-core_2.11-1.4.2-20150702.043114-52.pom:
> {noformat}
> ...
> 
> org.apache.spark
> spark-launcher_${scala.binary.version}
> ${project.version}
> 
> ...
> {noformat}
> while it should be
> {noformat}
> ...
> 
> org.apache.spark
> spark-launcher_2.11
> ${project.version}
> 
> ...
> {noformat}
> The following commits are most likely the cause of it:
> - for branch-1.3: 
> https://github.com/apache/spark/commit/ce137b8ed3b240b7516046699ac96daa55ddc129
> - for branch-1.4: 
> https://github.com/apache/spark/commit/84da653192a2d9edb82d0dbe50f577c4dc6a0c78
> - for master: 
> https://github.com/apache/spark/commit/984ad60147c933f2d5a2040c87ae687c14eb1724
> On branch-1.4 reverting the commit fixed the issue.
> See SPARK-3812 for additional details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6573) Convert inbound NaN values as null

2015-07-01 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611565#comment-14611565
 ] 

Josh Rosen commented on SPARK-6573:
---

NaN can lead to confusing exceptions during sorting if it appears in a column.  
I just ran into an issue where Sort threw a "Comparison method violates its 
general contract!" error for data containing NaN columns.  See my comments at 
https://github.com/apache/spark/pull/7179#discussion_r33749911

> Convert inbound NaN values as null
> --
>
> Key: SPARK-6573
> URL: https://issues.apache.org/jira/browse/SPARK-6573
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Fabian Boehnlein
>
> In pandas it is common to use numpy.nan as the null value, for missing data 
> or whatever.
> http://pandas.pydata.org/pandas-docs/dev/gotchas.html#nan-integer-na-values-and-na-type-promotions
> http://stackoverflow.com/questions/17534106/what-is-the-difference-between-nan-and-none
> http://pandas.pydata.org/pandas-docs/dev/missing_data.html#filling-missing-values-fillna
> createDataFrame however only works with None as null values, parsing them as 
> None in the RDD.
> I suggest to add support for np.nan values in pandas DataFrames.
> current stracktrace when calling a DataFrame with object type columns with 
> np.nan values (which are floats)
> {code}
> TypeError Traceback (most recent call last)
>  in ()
> > 1 sqldf = sqlCtx.createDataFrame(df_, schema=schema)
> /opt/spark/spark-1.3.0-bin-hadoop2.4/python/pyspark/sql/context.py in 
> createDataFrame(self, data, schema, samplingRatio)
> 339 schema = self._inferSchema(data.map(lambda r: 
> row_cls(*r)), samplingRatio)
> 340 
> --> 341 return self.applySchema(data, schema)
> 342 
> 343 def registerDataFrameAsTable(self, rdd, tableName):
> /opt/spark/spark-1.3.0-bin-hadoop2.4/python/pyspark/sql/context.py in 
> applySchema(self, rdd, schema)
> 246 
> 247 for row in rows:
> --> 248 _verify_type(row, schema)
> 249 
> 250 # convert python objects to sql data
> /opt/spark/spark-1.3.0-bin-hadoop2.4/python/pyspark/sql/types.py in 
> _verify_type(obj, dataType)
>1064  "length of fields (%d)" % (len(obj), 
> len(dataType.fields)))
>1065 for v, f in zip(obj, dataType.fields):
> -> 1066 _verify_type(v, f.dataType)
>1067 
>1068 _cached_cls = weakref.WeakValueDictionary()
> /opt/spark/spark-1.3.0-bin-hadoop2.4/python/pyspark/sql/types.py in 
> _verify_type(obj, dataType)
>1048 if type(obj) not in _acceptable_types[_type]:
>1049 raise TypeError("%s can not accept object in type %s"
> -> 1050 % (dataType, type(obj)))
>1051 
>1052 if isinstance(dataType, ArrayType):
> TypeError: StringType can not accept object in type {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8783) CTAS with WITH clause does not work

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8783:
---

Assignee: Apache Spark

> CTAS with WITH clause does not work
> ---
>
> Key: SPARK-8783
> URL: https://issues.apache.org/jira/browse/SPARK-8783
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Keuntae Park
>Assignee: Apache Spark
>Priority: Minor
>
> Following CTAS with WITH clause query 
> {code}
> CREATE TABLE with_table1 AS
> WITH T AS (
>   SELECT *
>   FROM table1
> )
> SELECT *
> FROM T
> {code}
> induces following error
> {code}
> no such table T; line 7 pos 5
> org.apache.spark.sql.AnalysisException: no such table T; line 7 pos 5
> ...
> {code}
> I think that WITH clause within CTAS is not handled properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8783) CTAS with WITH clause does not work

2015-07-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611556#comment-14611556
 ] 

Apache Spark commented on SPARK-8783:
-

User 'sirpkt' has created a pull request for this issue:
https://github.com/apache/spark/pull/7180

> CTAS with WITH clause does not work
> ---
>
> Key: SPARK-8783
> URL: https://issues.apache.org/jira/browse/SPARK-8783
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Keuntae Park
>Priority: Minor
>
> Following CTAS with WITH clause query 
> {code}
> CREATE TABLE with_table1 AS
> WITH T AS (
>   SELECT *
>   FROM table1
> )
> SELECT *
> FROM T
> {code}
> induces following error
> {code}
> no such table T; line 7 pos 5
> org.apache.spark.sql.AnalysisException: no such table T; line 7 pos 5
> ...
> {code}
> I think that WITH clause within CTAS is not handled properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8783) CTAS with WITH clause does not work

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8783:
---

Assignee: (was: Apache Spark)

> CTAS with WITH clause does not work
> ---
>
> Key: SPARK-8783
> URL: https://issues.apache.org/jira/browse/SPARK-8783
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Keuntae Park
>Priority: Minor
>
> Following CTAS with WITH clause query 
> {code}
> CREATE TABLE with_table1 AS
> WITH T AS (
>   SELECT *
>   FROM table1
> )
> SELECT *
> FROM T
> {code}
> induces following error
> {code}
> no such table T; line 7 pos 5
> org.apache.spark.sql.AnalysisException: no such table T; line 7 pos 5
> ...
> {code}
> I think that WITH clause within CTAS is not handled properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8708) MatrixFactorizationModel.predictAll() populates single partition only

2015-07-01 Thread Antony Mayi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611555#comment-14611555
 ] 

Antony Mayi commented on SPARK-8708:


bq. Antony Mayi In your real case, how many partitions did ALS.predictAll 
return?
512 partitions of which 511 are empty and the single one with all 13M ratings.

> MatrixFactorizationModel.predictAll() populates single partition only
> -
>
> Key: SPARK-8708
> URL: https://issues.apache.org/jira/browse/SPARK-8708
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.3.0
>Reporter: Antony Mayi
>
> When using mllib.recommendation.ALS the RDD returned by .predictAll() has all 
> values pushed into single partition despite using quite high parallelism.
> This degrades performance of further processing (I can obviously run 
> .partitionBy()) to balance it but that's still too costly (ie if running 
> .predictAll() in loop for thousands of products) and should be possible to do 
> it rather somehow on the model (automatically)).
> Bellow is an example on tiny sample (same on large dataset):
> {code:title=pyspark}
> >>> r1 = (1, 1, 1.0)
> >>> r2 = (1, 2, 2.0)
> >>> r3 = (2, 1, 2.0)
> >>> r4 = (2, 2, 2.0)
> >>> r5 = (3, 1, 1.0)
> >>> ratings = sc.parallelize([r1, r2, r3, r4, r5], 5)
> >>> ratings.getNumPartitions()
> 5
> >>> users = ratings.map(itemgetter(0)).distinct()
> >>> model = ALS.trainImplicit(ratings, 1, seed=10)
> >>> predictions_for_2 = model.predictAll(users.map(lambda u: (u, 2)))
> >>> predictions_for_2.glom().map(len).collect()
> [0, 0, 3, 0, 0]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-07-01 Thread hujiayin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611553#comment-14611553
 ] 

hujiayin commented on SPARK-5682:
-

Since the encrypted shuffle in spark is focus on the common module, it maybe 
not good to use hadoop API. On the other side, the AES solution is a bit heavy 
to encode/decode the live steaming data. 

> Add encrypted shuffle in spark
> --
>
> Key: SPARK-5682
> URL: https://issues.apache.org/jira/browse/SPARK-5682
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: liyunzhang_intel
> Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx, Design Document of Encrypted Spark 
> Shuffle_20150402.docx, Design Document of Encrypted Spark 
> Shuffle_20150506.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. AES  is a specification for 
> the encryption of electronic data. There are 5 common modes in AES. CTR is 
> one of the modes. We use two codec JceAesCtrCryptoCodec and 
> OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
> in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
> provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
> provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we 
> first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8783) CTAS with WITH clause does not work

2015-07-01 Thread Keuntae Park (JIRA)
Keuntae Park created SPARK-8783:
---

 Summary: CTAS with WITH clause does not work
 Key: SPARK-8783
 URL: https://issues.apache.org/jira/browse/SPARK-8783
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Keuntae Park
Priority: Minor


Following CTAS with WITH clause query 
{code}
CREATE TABLE with_table1 AS
WITH T AS (
  SELECT *
  FROM table1
)
SELECT *
FROM T
{code}
induces following error
{code}
no such table T; line 7 pos 5
org.apache.spark.sql.AnalysisException: no such table T; line 7 pos 5
...
{code}

I think that WITH clause within CTAS is not handled properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-07-01 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611547#comment-14611547
 ] 

liyunzhang_intel commented on SPARK-5682:
-

[~hujiayin]: thanks for your comment.

This feature is not based on hadooop2.6.  it is based on hadoop2.6 in original 
design. In the latest design doc(20150506), It shows that now there are two 
ways to implement encrypted shuffle in spark. Currently we only implement it on 
spark-on-yarn framework.  One is based on [Chimera(Chimera is a project which 
strips code related to CryptoInputStream/CryptoOutputStream from Hadoop to 
facilitate AES-NI based data encryption in other 
projects)|https://github.com/intel-hadoop/chimera](see 
https://github.com/apache/spark/pull/5307). In the other way,we implement all 
the crypto classes like CryptoInputStream/CryptoOutputStream in scala under 
core/src/main/scala/org/apache/spark/crypto/ package(see 
https://github.com/apache/spark/pull/4491).

For the problem of importing hadoop api in spark, if the interface of hadoop 
class is public and stable,it can be use in spark.
in 
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/classification/InterfaceStability.html,
 it says:
{quote}
Incompatible changes must not be made to classes marked as stable.
{quote}
which means when a class is marked stable, later release will not change it.





> Add encrypted shuffle in spark
> --
>
> Key: SPARK-5682
> URL: https://issues.apache.org/jira/browse/SPARK-5682
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: liyunzhang_intel
> Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx, Design Document of Encrypted Spark 
> Shuffle_20150402.docx, Design Document of Encrypted Spark 
> Shuffle_20150506.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. AES  is a specification for 
> the encryption of electronic data. There are 5 common modes in AES. CTR is 
> one of the modes. We use two codec JceAesCtrCryptoCodec and 
> OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
> in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
> provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
> provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we 
> first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8782) GenerateOrdering fails for NullType (i.e. ORDER BY NULL crashes)

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8782:
---

Assignee: Josh Rosen  (was: Apache Spark)

> GenerateOrdering fails for NullType (i.e. ORDER BY NULL crashes)
> 
>
> Key: SPARK-8782
> URL: https://issues.apache.org/jira/browse/SPARK-8782
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Blocker
>
> Queries containing ORDER BY NULL currently result in a code generation 
> exception:
> {code}
>   public SpecificOrdering 
> generate(org.apache.spark.sql.catalyst.expressions.Expression[] expr) {
> return new SpecificOrdering(expr);
>   }
>   class SpecificOrdering extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseOrdering {
> private org.apache.spark.sql.catalyst.expressions.Expression[] 
> expressions = null;
> public 
> SpecificOrdering(org.apache.spark.sql.catalyst.expressions.Expression[] expr) 
> {
>   expressions = expr;
> }
> @Override
> public int compare(InternalRow a, InternalRow b) {
>   InternalRow i = null;  // Holds current row being evaluated.
>   
>   i = a;
>   final Object primitive1 = null;
>   i = b;
>   final Object primitive3 = null;
>   if (true && true) {
> // Nothing
>   } else if (true) {
> return -1;
>   } else if (true) {
> return 1;
>   } else {
> int comp = primitive1.compare(primitive3);
> if (comp != 0) {
>   return comp;
> }
>   }
>   
>   return 0;
> }
>   }
> org.codehaus.commons.compiler.CompileException: Line 29, Column 43: A method 
> named "compare" is not declared in any enclosing class nor any supertype, nor 
> through a static import
>   at 
> org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:10174)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8782) GenerateOrdering fails for NullType (i.e. ORDER BY NULL crashes)

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8782:
---

Assignee: Apache Spark  (was: Josh Rosen)

> GenerateOrdering fails for NullType (i.e. ORDER BY NULL crashes)
> 
>
> Key: SPARK-8782
> URL: https://issues.apache.org/jira/browse/SPARK-8782
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Josh Rosen
>Assignee: Apache Spark
>Priority: Blocker
>
> Queries containing ORDER BY NULL currently result in a code generation 
> exception:
> {code}
>   public SpecificOrdering 
> generate(org.apache.spark.sql.catalyst.expressions.Expression[] expr) {
> return new SpecificOrdering(expr);
>   }
>   class SpecificOrdering extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseOrdering {
> private org.apache.spark.sql.catalyst.expressions.Expression[] 
> expressions = null;
> public 
> SpecificOrdering(org.apache.spark.sql.catalyst.expressions.Expression[] expr) 
> {
>   expressions = expr;
> }
> @Override
> public int compare(InternalRow a, InternalRow b) {
>   InternalRow i = null;  // Holds current row being evaluated.
>   
>   i = a;
>   final Object primitive1 = null;
>   i = b;
>   final Object primitive3 = null;
>   if (true && true) {
> // Nothing
>   } else if (true) {
> return -1;
>   } else if (true) {
> return 1;
>   } else {
> int comp = primitive1.compare(primitive3);
> if (comp != 0) {
>   return comp;
> }
>   }
>   
>   return 0;
> }
>   }
> org.codehaus.commons.compiler.CompileException: Line 29, Column 43: A method 
> named "compare" is not declared in any enclosing class nor any supertype, nor 
> through a static import
>   at 
> org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:10174)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8782) GenerateOrdering fails for NullType (i.e. ORDER BY NULL crashes)

2015-07-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611543#comment-14611543
 ] 

Apache Spark commented on SPARK-8782:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/7179

> GenerateOrdering fails for NullType (i.e. ORDER BY NULL crashes)
> 
>
> Key: SPARK-8782
> URL: https://issues.apache.org/jira/browse/SPARK-8782
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Blocker
>
> Queries containing ORDER BY NULL currently result in a code generation 
> exception:
> {code}
>   public SpecificOrdering 
> generate(org.apache.spark.sql.catalyst.expressions.Expression[] expr) {
> return new SpecificOrdering(expr);
>   }
>   class SpecificOrdering extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseOrdering {
> private org.apache.spark.sql.catalyst.expressions.Expression[] 
> expressions = null;
> public 
> SpecificOrdering(org.apache.spark.sql.catalyst.expressions.Expression[] expr) 
> {
>   expressions = expr;
> }
> @Override
> public int compare(InternalRow a, InternalRow b) {
>   InternalRow i = null;  // Holds current row being evaluated.
>   
>   i = a;
>   final Object primitive1 = null;
>   i = b;
>   final Object primitive3 = null;
>   if (true && true) {
> // Nothing
>   } else if (true) {
> return -1;
>   } else if (true) {
> return 1;
>   } else {
> int comp = primitive1.compare(primitive3);
> if (comp != 0) {
>   return comp;
> }
>   }
>   
>   return 0;
> }
>   }
> org.codehaus.commons.compiler.CompileException: Line 29, Column 43: A method 
> named "compare" is not declared in any enclosing class nor any supertype, nor 
> through a static import
>   at 
> org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:10174)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8687) Spark on yarn-client mode can't send `spark.yarn.credentials.file` to executor.

2015-07-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-8687:
-
Fix Version/s: 1.4.2

> Spark on yarn-client mode can't send `spark.yarn.credentials.file` to 
> executor.
> ---
>
> Key: SPARK-8687
> URL: https://issues.apache.org/jira/browse/SPARK-8687
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.0
>Reporter: SaintBacchus
>Assignee: SaintBacchus
> Fix For: 1.5.0, 1.4.2
>
>
> Yarn will set +spark.yarn.credentials.file+ after *DriverEndpoint* 
> initialized. So executor will fetch the old configuration and will cause the 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8687) Spark on yarn-client mode can't send `spark.yarn.credentials.file` to executor.

2015-07-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-8687:
-
Target Version/s: 1.5.0, 1.4.2  (was: 1.5.0)

> Spark on yarn-client mode can't send `spark.yarn.credentials.file` to 
> executor.
> ---
>
> Key: SPARK-8687
> URL: https://issues.apache.org/jira/browse/SPARK-8687
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.0
>Reporter: SaintBacchus
>Assignee: SaintBacchus
> Fix For: 1.5.0, 1.4.2
>
>
> Yarn will set +spark.yarn.credentials.file+ after *DriverEndpoint* 
> initialized. So executor will fetch the old configuration and will cause the 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-8687) Spark on yarn-client mode can't send `spark.yarn.credentials.file` to executor.

2015-07-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-8687.

  Resolution: Fixed
Assignee: SaintBacchus
   Fix Version/s: 1.5.0
Target Version/s: 1.5.0

> Spark on yarn-client mode can't send `spark.yarn.credentials.file` to 
> executor.
> ---
>
> Key: SPARK-8687
> URL: https://issues.apache.org/jira/browse/SPARK-8687
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.0
>Reporter: SaintBacchus
>Assignee: SaintBacchus
> Fix For: 1.5.0
>
>
> Yarn will set +spark.yarn.credentials.file+ after *DriverEndpoint* 
> initialized. So executor will fetch the old configuration and will cause the 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-3071) Increase default driver memory

2015-07-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-3071.

   Resolution: Fixed
Fix Version/s: 1.5.0

> Increase default driver memory
> --
>
> Key: SPARK-3071
> URL: https://issues.apache.org/jira/browse/SPARK-3071
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.4.2
>Reporter: Xiangrui Meng
>Assignee: Ilya Ganelin
> Fix For: 1.5.0
>
>
> The current default is 512M, which is usually too small because user also 
> uses driver to do some computation. In local mode, executor memory setting is 
> ignored while only driver memory is used, which provides more incentive to 
> increase the default driver memory.
> I suggest
> 1. 2GB in local mode and warn users if executor memory is set a bigger value
> 2. same as worker memory on an EC2 standalone server



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5682) Add encrypted shuffle in spark

2015-07-01 Thread hujiayin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611527#comment-14611527
 ] 

hujiayin edited comment on SPARK-5682 at 7/2/15 6:10 AM:
-

Steps were added to encode and decode the data, the performance will not be 
fast than before, in the same time, codes also have security issue, for example 
save the plain text in configuration file and finally used as the part of the 
key

If you use a better cypher solution, the performance downgrade will be 
minimized. i think AES is a bit heavy.

In the same time, the feature based on hadoop 2.6, it is the limitation, that 
is why i said rely on hadoop

Though the API is public stable, however, you cannot ensure if the API will not 
be changed since it is not the comercial software.



was (Author: hujiayin):
Steps were added to encode and decode the data, the performance will not be 
fast than before, in the same time, codes also have security issue, for example 
save the plain text in configuration file and finally used as the part of the 
key

In the same time, the feature based on hadoop 2.6, it is the limitation, that 
is why i said rely on hadoop

Though the API is public stable, however, you cannot ensure if the API will not 
be changed since it is not the comercial software.


> Add encrypted shuffle in spark
> --
>
> Key: SPARK-5682
> URL: https://issues.apache.org/jira/browse/SPARK-5682
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: liyunzhang_intel
> Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx, Design Document of Encrypted Spark 
> Shuffle_20150402.docx, Design Document of Encrypted Spark 
> Shuffle_20150506.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. AES  is a specification for 
> the encryption of electronic data. There are 5 common modes in AES. CTR is 
> one of the modes. We use two codec JceAesCtrCryptoCodec and 
> OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
> in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
> provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
> provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we 
> first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-8740) Support GitHub OAuth tokens in dev/merge_spark_pr.py

2015-07-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-8740.

  Resolution: Fixed
   Fix Version/s: 1.5.0
Target Version/s: 1.5.0

> Support GitHub OAuth tokens in dev/merge_spark_pr.py
> 
>
> Key: SPARK-8740
> URL: https://issues.apache.org/jira/browse/SPARK-8740
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Minor
> Fix For: 1.5.0
>
>
> We should allow dev/merge_spark_pr.py to use personal GitHub OAuth tokens in 
> order to make authenticated requests. This is necessary to work around per-IP 
> rate limiting issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-8769) toLocalIterator should mention it results in many jobs

2015-07-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-8769.

  Resolution: Fixed
Assignee: holdenk
   Fix Version/s: 1.4.2
  1.5.0
Target Version/s: 1.5.0, 1.4.2

> toLocalIterator should mention it results in many jobs
> --
>
> Key: SPARK-8769
> URL: https://issues.apache.org/jira/browse/SPARK-8769
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: holdenk
>Assignee: holdenk
>Priority: Trivial
> Fix For: 1.5.0, 1.4.2
>
>
> toLocalIterator on RDDs should mention that it results in mutliple jobs, and 
> that to avoid re-computing, if the input was the result of a 
> wide-transformation, the input should be cached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-8771) Actor system deprecation tag uses deprecated deprecation tag

2015-07-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-8771.

  Resolution: Fixed
Assignee: holdenk
   Fix Version/s: 1.5.0
Target Version/s: 1.5.0

> Actor system deprecation tag uses deprecated deprecation tag
> 
>
> Key: SPARK-8771
> URL: https://issues.apache.org/jira/browse/SPARK-8771
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.4.0
>Reporter: holdenk
>Assignee: holdenk
>Priority: Trivial
> Fix For: 1.5.0
>
>
> The deprecation of the actor system adds a spurious build warning:
> {quote}
> @deprecated now takes two arguments; see the scaladoc.
> [warn]   @deprecated("Actor system is no longer supported as of 1.4")
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8771) Actor system deprecation tag uses deprecated deprecation tag

2015-07-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-8771:
-
Affects Version/s: 1.4.0

> Actor system deprecation tag uses deprecated deprecation tag
> 
>
> Key: SPARK-8771
> URL: https://issues.apache.org/jira/browse/SPARK-8771
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.4.0
>Reporter: holdenk
>Priority: Trivial
>
> The deprecation of the actor system adds a spurious build warning:
> {quote}
> @deprecated now takes two arguments; see the scaladoc.
> [warn]   @deprecated("Actor system is no longer supported as of 1.4")
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5682) Add encrypted shuffle in spark

2015-07-01 Thread hujiayin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611527#comment-14611527
 ] 

hujiayin edited comment on SPARK-5682 at 7/2/15 6:03 AM:
-

Steps were added to encode and decode the data, the performance will not be 
fast than before, in the same time, codes also have security issue, for example 
save the plain text in configuration file and finally used as the part of the 
key

In the same time, the feature based on hadoop 2.6, it is the limitation, that 
is why i said rely on hadoop

Though the API is public stable, however, you cannot ensure if the API will not 
be changed since it is not the comercial software.



was (Author: hujiayin):
steps were added to encode and decode the data, the performance will not be 
fast than before, in the same time, codes also have security issue, for example 
save the plain text in configuration file and finally used as the part of the 
key

in the same time, the feature based on hadoop 2.6, it is the limitation, that 
is why i said rely on hadoop

though it is public stable, however, you cannot ensure if the api will not be 
changed since it was not the comercial software.


> Add encrypted shuffle in spark
> --
>
> Key: SPARK-5682
> URL: https://issues.apache.org/jira/browse/SPARK-5682
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: liyunzhang_intel
> Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx, Design Document of Encrypted Spark 
> Shuffle_20150402.docx, Design Document of Encrypted Spark 
> Shuffle_20150506.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. AES  is a specification for 
> the encryption of electronic data. There are 5 common modes in AES. CTR is 
> one of the modes. We use two codec JceAesCtrCryptoCodec and 
> OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
> in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
> provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
> provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we 
> first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-8688) Hadoop Configuration has to disable client cache when writing or reading delegation tokens.

2015-07-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-8688.

  Resolution: Fixed
Assignee: SaintBacchus
   Fix Version/s: 1.5.0
Target Version/s: 1.5.0

> Hadoop Configuration has to disable client cache when writing or reading 
> delegation tokens.
> ---
>
> Key: SPARK-8688
> URL: https://issues.apache.org/jira/browse/SPARK-8688
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.0
>Reporter: SaintBacchus
>Assignee: SaintBacchus
> Fix For: 1.5.0
>
>
> In class *AMDelegationTokenRenewer* and *ExecutorDelegationTokenUpdater*, 
> Spark will write and read the credentials.
> But if we don't disable the *fs.hdfs.impl.disable.cache*, Spark will use 
> cached  FileSystem (which will use old token ) to  upload or download file.
> Then when the old token is expired, it can't gain the auth to get/put the 
> hdfs.
> (I only tested in a very short time with the configuration:
> dfs.namenode.delegation.token.renew-interval=3min
> dfs.namenode.delegation.token.max-lifetime=10min
> I'm not sure whatever it matters.
>  )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5682) Add encrypted shuffle in spark

2015-07-01 Thread hujiayin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611527#comment-14611527
 ] 

hujiayin edited comment on SPARK-5682 at 7/2/15 6:02 AM:
-

steps were added to encode and decode the data, the performance will not be 
fast than before, in the same time, codes also have security issue, for example 
save the plain text in configuration file and finally used as the part of the 
key

in the same time, the feature based on hadoop 2.6, it is the limitation, that 
is why i said rely on hadoop

though it is public stable, however, you cannot ensure if the api will not be 
changed since it was not the comercial software.



was (Author: hujiayin):
steps were added to encode and decode the data, the performance will not be 
fast than before, in the same time, codes also have security issue, for example 
save the plain text in configuration file and finally used as the part of the 
key

in the same time, the feature based on hadoop 2.6, it is the limitation, that 
is why i said reply on hadoop

though it is public stable, however, you cannot ensure if the api will not be 
changed since it was not the comercial software.


> Add encrypted shuffle in spark
> --
>
> Key: SPARK-5682
> URL: https://issues.apache.org/jira/browse/SPARK-5682
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: liyunzhang_intel
> Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx, Design Document of Encrypted Spark 
> Shuffle_20150402.docx, Design Document of Encrypted Spark 
> Shuffle_20150506.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. AES  is a specification for 
> the encryption of electronic data. There are 5 common modes in AES. CTR is 
> one of the modes. We use two codec JceAesCtrCryptoCodec and 
> OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
> in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
> provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
> provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we 
> first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-07-01 Thread hujiayin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611527#comment-14611527
 ] 

hujiayin commented on SPARK-5682:
-

steps were added to encode and decode the data, the performance will not be 
fast than before, in the same time, codes also have security issue, for example 
save the plain text in configuration file and finally used as the part of the 
key

in the same time, the feature based on hadoop 2.6, it is the limitation, that 
is why i said reply on hadoop

though it is public stable, however, you cannot ensure if the api will not be 
changed since it was not the comercial software.


> Add encrypted shuffle in spark
> --
>
> Key: SPARK-5682
> URL: https://issues.apache.org/jira/browse/SPARK-5682
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: liyunzhang_intel
> Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx, Design Document of Encrypted Spark 
> Shuffle_20150402.docx, Design Document of Encrypted Spark 
> Shuffle_20150506.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. AES  is a specification for 
> the encryption of electronic data. There are 5 common modes in AES. CTR is 
> one of the modes. We use two codec JceAesCtrCryptoCodec and 
> OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
> in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
> provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
> provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we 
> first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-8754) YarnClientSchedulerBackend doesn't stop gracefully in failure conditions

2015-07-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-8754.

  Resolution: Fixed
   Fix Version/s: 1.4.2
  1.5.0
Target Version/s: 1.5.0, 1.4.2

> YarnClientSchedulerBackend doesn't stop gracefully in failure conditions
> 
>
> Key: SPARK-8754
> URL: https://issues.apache.org/jira/browse/SPARK-8754
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.4.0
>Reporter: Devaraj K
>Priority: Minor
> Fix For: 1.5.0, 1.4.2
>
>
> {code:xml}
> java.lang.NullPointerException
> at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:151)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:421)
> at 
> org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1447)
> at org.apache.spark.SparkContext.stop(SparkContext.scala:1651)
> at org.apache.spark.SparkContext.(SparkContext.scala:572)
> at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:28)
> at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:621)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}
> If the application has FINISHED/FAILED/KILLED or failed to launch application 
> master, monitorThread is not getting initialized but 
> monitorThread.interrupt() is getting invoked as part of stop() without any 
> check and It is causing to throw NPE and also it is preventing to stop the 
> client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8708) MatrixFactorizationModel.predictAll() populates single partition only

2015-07-01 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611505#comment-14611505
 ] 

Xiangrui Meng commented on SPARK-8708:
--

[~antonymayi] In your real case, how many partitions did ALS.predictAll return?

> MatrixFactorizationModel.predictAll() populates single partition only
> -
>
> Key: SPARK-8708
> URL: https://issues.apache.org/jira/browse/SPARK-8708
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.3.0
>Reporter: Antony Mayi
>
> When using mllib.recommendation.ALS the RDD returned by .predictAll() has all 
> values pushed into single partition despite using quite high parallelism.
> This degrades performance of further processing (I can obviously run 
> .partitionBy()) to balance it but that's still too costly (ie if running 
> .predictAll() in loop for thousands of products) and should be possible to do 
> it rather somehow on the model (automatically)).
> Bellow is an example on tiny sample (same on large dataset):
> {code:title=pyspark}
> >>> r1 = (1, 1, 1.0)
> >>> r2 = (1, 2, 2.0)
> >>> r3 = (2, 1, 2.0)
> >>> r4 = (2, 2, 2.0)
> >>> r5 = (3, 1, 1.0)
> >>> ratings = sc.parallelize([r1, r2, r3, r4, r5], 5)
> >>> ratings.getNumPartitions()
> 5
> >>> users = ratings.map(itemgetter(0)).distinct()
> >>> model = ALS.trainImplicit(ratings, 1, seed=10)
> >>> predictions_for_2 = model.predictAll(users.map(lambda u: (u, 2)))
> >>> predictions_for_2.glom().map(len).collect()
> [0, 0, 3, 0, 0]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8782) GenerateOrdering fails for NullType (i.e. ORDER BY NULL crashes)

2015-07-01 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-8782:
-

 Summary: GenerateOrdering fails for NullType (i.e. ORDER BY NULL 
crashes)
 Key: SPARK-8782
 URL: https://issues.apache.org/jira/browse/SPARK-8782
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 1.5.0
Reporter: Josh Rosen
Assignee: Josh Rosen
Priority: Blocker


Queries containing ORDER BY NULL currently result in a code generation 
exception:

{code}
  public SpecificOrdering 
generate(org.apache.spark.sql.catalyst.expressions.Expression[] expr) {
return new SpecificOrdering(expr);
  }

  class SpecificOrdering extends 
org.apache.spark.sql.catalyst.expressions.codegen.BaseOrdering {

private org.apache.spark.sql.catalyst.expressions.Expression[] 
expressions = null;

public 
SpecificOrdering(org.apache.spark.sql.catalyst.expressions.Expression[] expr) {
  expressions = expr;
}

@Override
public int compare(InternalRow a, InternalRow b) {
  InternalRow i = null;  // Holds current row being evaluated.
  
  i = a;
  final Object primitive1 = null;
  i = b;
  final Object primitive3 = null;
  if (true && true) {
// Nothing
  } else if (true) {
return -1;
  } else if (true) {
return 1;
  } else {
int comp = primitive1.compare(primitive3);
if (comp != 0) {
  return comp;
}
  }
  
  return 0;
}
  }
org.codehaus.commons.compiler.CompileException: Line 29, Column 43: A method 
named "compare" is not declared in any enclosing class nor any supertype, nor 
through a static import
at 
org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:10174)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8781) Pusblished POMs are no longer effective POMs

2015-07-01 Thread Konstantin Shaposhnikov (JIRA)
Konstantin Shaposhnikov created SPARK-8781:
--

 Summary: Pusblished POMs are no longer effective POMs
 Key: SPARK-8781
 URL: https://issues.apache.org/jira/browse/SPARK-8781
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.3.2, 1.4.1, 1.5.0
Reporter: Konstantin Shaposhnikov


Published to maven repository POMs are no longer effective POMs. E.g. 

In 
https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-core_2.11/1.4.2-SNAPSHOT/spark-core_2.11-1.4.2-20150702.043114-52.pom:

{noformat}
...

org.apache.spark
spark-launcher_${scala.binary.version}
${project.version}

...
{noformat}

while it should be

{noformat}
...

org.apache.spark
spark-launcher_2.11
${project.version}

...
{noformat}


The following commits are most likely the cause of it:
- for branch-1.3: 
https://github.com/apache/spark/commit/ce137b8ed3b240b7516046699ac96daa55ddc129
- for branch-1.4: 
https://github.com/apache/spark/commit/84da653192a2d9edb82d0dbe50f577c4dc6a0c78
- for master: 
https://github.com/apache/spark/commit/984ad60147c933f2d5a2040c87ae687c14eb1724

On branch-1.4 reverting the commit fixed the issue.

See SPARK-3812 for additional details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8780) Move Python doctest code example from models to algorithms

2015-07-01 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-8780:
--

 Summary: Move Python doctest code example from models to algorithms
 Key: SPARK-8780
 URL: https://issues.apache.org/jira/browse/SPARK-8780
 Project: Spark
  Issue Type: Improvement
  Components: MLlib, PySpark
Affects Versions: 1.5.0
Reporter: Yanbo Liang


Almost all doctest code examples are in the models at Pyspark mllib.
Since users usually start with algorithms rather than models, we need to move 
them from models to algorithms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-07-01 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611491#comment-14611491
 ] 

liyunzhang_intel commented on SPARK-5682:
-

[~hujiayin]:  thanks for your comment
{quote}
The solution relied on hadoop API and maybe downgrade the performance. 
{quote}
For The solution relied on hadoop API: You mean i use org.apache.hadoop.io.Text 
in [CommonConfigurationKeys 
|https://github.com/apache/spark/pull/4491/files#diff-a76c55d0e8f2e4e1a6cb5848826585fe].
  
But i have different idea for this:
{code}
@Stringable
@InterfaceAudience.Public
@InterfaceStability.Stable
public class Text extends BinaryComparable
org.apache.hadoop.io.Text  
{code}

it shows that org.apache.hadoop.io.Text  is stable which means the interfaces 
it provides will be not changed a lot in the  later release.

For downgrade the performance: have you any test results to show this?
 



> Add encrypted shuffle in spark
> --
>
> Key: SPARK-5682
> URL: https://issues.apache.org/jira/browse/SPARK-5682
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: liyunzhang_intel
> Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx, Design Document of Encrypted Spark 
> Shuffle_20150402.docx, Design Document of Encrypted Spark 
> Shuffle_20150506.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. AES  is a specification for 
> the encryption of electronic data. There are 5 common modes in AES. CTR is 
> one of the modes. We use two codec JceAesCtrCryptoCodec and 
> OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
> in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
> provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
> provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we 
> first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-8227) math function: unhex

2015-07-01 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-8227.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7113
[https://github.com/apache/spark/pull/7113]

> math function: unhex
> 
>
> Key: SPARK-8227
> URL: https://issues.apache.org/jira/browse/SPARK-8227
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: zhichao-li
> Fix For: 1.5.0
>
>
> unhex(STRING a): BINARY
> Inverse of hex. Interprets each pair of characters as a hexadecimal number 
> and converts to the byte representation of the number.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8779) Add documentation for Python's FP-growth

2015-07-01 Thread Hrishikesh (JIRA)
Hrishikesh created SPARK-8779:
-

 Summary: Add documentation for Python's FP-growth
 Key: SPARK-8779
 URL: https://issues.apache.org/jira/browse/SPARK-8779
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, MLlib, PySpark
Reporter: Hrishikesh
Priority: Minor


We need to add documentation for Python FP-Growth in the MLlib Programming 
Guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8224) math function: shiftright

2015-07-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611459#comment-14611459
 ] 

Apache Spark commented on SPARK-8224:
-

User 'zhichao-li' has created a pull request for this issue:
https://github.com/apache/spark/pull/7035

> math function: shiftright
> -
>
> Key: SPARK-8224
> URL: https://issues.apache.org/jira/browse/SPARK-8224
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: zhichao-li
>
> shiftrightunsigned(INT a), shiftrightunsigned(BIGINT a)   
> Bitwise unsigned right shift (as of Hive 1.2.0). Returns int for tinyint, 
> smallint and int a. Returns bigint for bigint a.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8223) math function: shiftleft

2015-07-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611458#comment-14611458
 ] 

Apache Spark commented on SPARK-8223:
-

User 'zhichao-li' has created a pull request for this issue:
https://github.com/apache/spark/pull/7035

> math function: shiftleft
> 
>
> Key: SPARK-8223
> URL: https://issues.apache.org/jira/browse/SPARK-8223
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: zhichao-li
>
> shiftleft(INT a)
> shiftleft(BIGINT a)
> Bitwise left shift (as of Hive 1.2.0). Returns int for tinyint, smallint and 
> int a. Returns bigint for bigint a.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8223) math function: shiftleft

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8223:
---

Assignee: Apache Spark  (was: zhichao-li)

> math function: shiftleft
> 
>
> Key: SPARK-8223
> URL: https://issues.apache.org/jira/browse/SPARK-8223
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> shiftleft(INT a)
> shiftleft(BIGINT a)
> Bitwise left shift (as of Hive 1.2.0). Returns int for tinyint, smallint and 
> int a. Returns bigint for bigint a.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8223) math function: shiftleft

2015-07-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611448#comment-14611448
 ] 

Apache Spark commented on SPARK-8223:
-

User 'tarekauel' has created a pull request for this issue:
https://github.com/apache/spark/pull/7178

> math function: shiftleft
> 
>
> Key: SPARK-8223
> URL: https://issues.apache.org/jira/browse/SPARK-8223
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: zhichao-li
>
> shiftleft(INT a)
> shiftleft(BIGINT a)
> Bitwise left shift (as of Hive 1.2.0). Returns int for tinyint, smallint and 
> int a. Returns bigint for bigint a.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8224) math function: shiftright

2015-07-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611449#comment-14611449
 ] 

Apache Spark commented on SPARK-8224:
-

User 'tarekauel' has created a pull request for this issue:
https://github.com/apache/spark/pull/7178

> math function: shiftright
> -
>
> Key: SPARK-8224
> URL: https://issues.apache.org/jira/browse/SPARK-8224
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: zhichao-li
>
> shiftrightunsigned(INT a), shiftrightunsigned(BIGINT a)   
> Bitwise unsigned right shift (as of Hive 1.2.0). Returns int for tinyint, 
> smallint and int a. Returns bigint for bigint a.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8224) math function: shiftright

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8224:
---

Assignee: Apache Spark  (was: zhichao-li)

> math function: shiftright
> -
>
> Key: SPARK-8224
> URL: https://issues.apache.org/jira/browse/SPARK-8224
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> shiftrightunsigned(INT a), shiftrightunsigned(BIGINT a)   
> Bitwise unsigned right shift (as of Hive 1.2.0). Returns int for tinyint, 
> smallint and int a. Returns bigint for bigint a.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8223) math function: shiftleft

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8223:
---

Assignee: zhichao-li  (was: Apache Spark)

> math function: shiftleft
> 
>
> Key: SPARK-8223
> URL: https://issues.apache.org/jira/browse/SPARK-8223
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: zhichao-li
>
> shiftleft(INT a)
> shiftleft(BIGINT a)
> Bitwise left shift (as of Hive 1.2.0). Returns int for tinyint, smallint and 
> int a. Returns bigint for bigint a.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8224) math function: shiftright

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8224:
---

Assignee: zhichao-li  (was: Apache Spark)

> math function: shiftright
> -
>
> Key: SPARK-8224
> URL: https://issues.apache.org/jira/browse/SPARK-8224
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: zhichao-li
>
> shiftrightunsigned(INT a), shiftrightunsigned(BIGINT a)   
> Bitwise unsigned right shift (as of Hive 1.2.0). Returns int for tinyint, 
> smallint and int a. Returns bigint for bigint a.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-8770) BinaryOperator expression

2015-07-01 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-8770.

Resolution: Fixed

> BinaryOperator expression
> -
>
> Key: SPARK-8770
> URL: https://issues.apache.org/jira/browse/SPARK-8770
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 1.5.0
>
>
> Our current BinaryExpression abstract class is not for generic binary 
> expressions, i.e. it requires left/right children to have the same type. 
> However, due to its name, contributors build new binary expressions that 
> don't have that assumption (e.g. Sha) and still extend BinaryExpression.
> We should create a new BinaryOperator abstract class with this assumption, 
> and update the analyzer to only apply type casting rule there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8765) Flaky PySpark PowerIterationClustering test

2015-07-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611440#comment-14611440
 ] 

Apache Spark commented on SPARK-8765:
-

User 'yanboliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/7177

> Flaky PySpark PowerIterationClustering test
> ---
>
> Key: SPARK-8765
> URL: https://issues.apache.org/jira/browse/SPARK-8765
> Project: Spark
>  Issue Type: Test
>  Components: MLlib, PySpark
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Critical
>  Labels: flaky-test
>
> See failure: 
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console]
> {code}
> **
> File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py",
>  line 291, in __main__.PowerIterationClusteringModel
> Failed example:
> sorted(model.assignments().collect())
> Expected:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ...
> Got:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), 
> Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, 
> cluster=0)]
> **
> File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/clustering.py",
>  line 299, in __main__.PowerIterationClusteringModel
> Failed example:
> sorted(sameModel.assignments().collect())
> Expected:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=0), ...
> Got:
> [Assignment(id=0, cluster=1), Assignment(id=1, cluster=1), 
> Assignment(id=2, cluster=1), Assignment(id=3, cluster=1), Assignment(id=4, 
> cluster=0)]
> **
>2 of  13 in __main__.PowerIterationClusteringModel
> ***Test Failed*** 2 failures.
> Had test failures in pyspark.mllib.clustering with python2.6; see logs.
> {code}
> CC: [~mengxr] [~yanboliang]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8778) The item of "Scheduler delay" is not consistent between "Event Timeline" and "Task List"

2015-07-01 Thread zhangxiongfei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangxiongfei updated SPARK-8778:
-
Description: In  the page "Details for Stage" of Spark Web UI,  "Scheduler 
delay" of some running tasks showed in "Event Timeline"  is not  consistent 
with that showed in "Tasks".I attached 2 snapshots.In "Event Timeline" section 
,almost all the time is about "Scheduler delay",however, the "Scheduler delay" 
is 0 in the "Task" section.  (was: In  the page "Details for Stage" of Spark 
Web UI,  "Scheduler delay" of some running tasks showed in "Event Timeline"  is 
not displayed consistent with that showed in "Tasks".I attached 2 snapshots.In 
"Event Timeline" section ,almost all the time is about "Scheduler 
delay",however, the "Scheduler delay" is 0 in the "Task" section.)

> The item of "Scheduler delay" is not consistent between "Event Timeline" and 
> "Task List"
> 
>
> Key: SPARK-8778
> URL: https://issues.apache.org/jira/browse/SPARK-8778
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: zhangxiongfei
>Priority: Minor
> Attachments: Event Timeline.png, Tasks.png
>
>
> In  the page "Details for Stage" of Spark Web UI,  "Scheduler delay" of some 
> running tasks showed in "Event Timeline"  is not  consistent with that showed 
> in "Tasks".I attached 2 snapshots.In "Event Timeline" section ,almost all the 
> time is about "Scheduler delay",however, the "Scheduler delay" is 0 in the 
> "Task" section.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8778) The item of "Scheduler delay" is not consistent between "Event Timeline" and "Task List"

2015-07-01 Thread zhangxiongfei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangxiongfei updated SPARK-8778:
-
Attachment: Event Timeline.png
Tasks.png

> The item of "Scheduler delay" is not consistent between "Event Timeline" and 
> "Task List"
> 
>
> Key: SPARK-8778
> URL: https://issues.apache.org/jira/browse/SPARK-8778
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: zhangxiongfei
>Priority: Minor
> Attachments: Event Timeline.png, Tasks.png
>
>
> In  the page "Details for Stage" of Spark Web UI,  "Scheduler delay" of some 
> running tasks showed in "Event Timeline"  is not displayed consistent with 
> that showed in "Tasks".I attached 2 snapshots.In "Event Timeline" section 
> ,almost all the time is about "Scheduler delay",however, the "Scheduler 
> delay" is 0 in the "Task" section.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8778) The item of "Scheduler delay" is not consistent between "Event Timeline" and "Task List"

2015-07-01 Thread zhangxiongfei (JIRA)
zhangxiongfei created SPARK-8778:


 Summary: The item of "Scheduler delay" is not consistent between 
"Event Timeline" and "Task List"
 Key: SPARK-8778
 URL: https://issues.apache.org/jira/browse/SPARK-8778
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.4.0
Reporter: zhangxiongfei
Priority: Minor


In  the page "Details for Stage" of Spark Web UI,  "Scheduler delay" of some 
running tasks showed in "Event Timeline"  is not displayed consistent with that 
showed in "Tasks".I attached 2 snapshots.In "Event Timeline" section ,almost 
all the time is about "Scheduler delay",however, the "Scheduler delay" is 0 in 
the "Task" section.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8777) Add random data generation test utilities to Spark SQL

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8777:
---

Assignee: Apache Spark  (was: Josh Rosen)

> Add random data generation test utilities to Spark SQL
> --
>
> Key: SPARK-8777
> URL: https://issues.apache.org/jira/browse/SPARK-8777
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Josh Rosen
>Assignee: Apache Spark
>
> We should add utility functions for generating data that conforms to a given 
> SparkSQL DataType or Schema. This would make it significantly easier to write 
> certain types of tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8777) Add random data generation test utilities to Spark SQL

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8777:
---

Assignee: Josh Rosen  (was: Apache Spark)

> Add random data generation test utilities to Spark SQL
> --
>
> Key: SPARK-8777
> URL: https://issues.apache.org/jira/browse/SPARK-8777
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> We should add utility functions for generating data that conforms to a given 
> SparkSQL DataType or Schema. This would make it significantly easier to write 
> certain types of tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8777) Add random data generation test utilities to Spark SQL

2015-07-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611404#comment-14611404
 ] 

Apache Spark commented on SPARK-8777:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/7176

> Add random data generation test utilities to Spark SQL
> --
>
> Key: SPARK-8777
> URL: https://issues.apache.org/jira/browse/SPARK-8777
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> We should add utility functions for generating data that conforms to a given 
> SparkSQL DataType or Schema. This would make it significantly easier to write 
> certain types of tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8777) Add random data generation test utilities to Spark SQL

2015-07-01 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-8777:
-

 Summary: Add random data generation test utilities to Spark SQL
 Key: SPARK-8777
 URL: https://issues.apache.org/jira/browse/SPARK-8777
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Josh Rosen
Assignee: Josh Rosen


We should add utility functions for generating data that conforms to a given 
SparkSQL DataType or Schema. This would make it significantly easier to write 
certain types of tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8776) Increase the default MaxPermSize

2015-07-01 Thread Yin Huai (JIRA)
Yin Huai created SPARK-8776:
---

 Summary: Increase the default MaxPermSize
 Key: SPARK-8776
 URL: https://issues.apache.org/jira/browse/SPARK-8776
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Yin Huai


Since 1.4.0, Spark SQL has isolated class loaders for seperating hive 
dependencies on metastore and execution, which increases the memory consumption 
of PermGen. How about we increase the default size from 128m to 256m? Seems the 
change we need to make is 
https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java#L139.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8775) Move expression specific type coercion into expression themselves

2015-07-01 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-8775:
--

 Summary: Move expression specific type coercion into expression 
themselves
 Key: SPARK-8775
 URL: https://issues.apache.org/jira/browse/SPARK-8775
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8772) Implement implicit type cast for expressions that define expected input types

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8772:
---

Assignee: Reynold Xin  (was: Apache Spark)

> Implement implicit type cast for expressions that define expected input types
> -
>
> Key: SPARK-8772
> URL: https://issues.apache.org/jira/browse/SPARK-8772
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> We should have a engine-wide implicit cast rule defined.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8772) Implement implicit type cast for expressions that define expected input types

2015-07-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611360#comment-14611360
 ] 

Apache Spark commented on SPARK-8772:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/7175

> Implement implicit type cast for expressions that define expected input types
> -
>
> Key: SPARK-8772
> URL: https://issues.apache.org/jira/browse/SPARK-8772
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> We should have a engine-wide implicit cast rule defined.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8772) Implement implicit type cast for expressions that define expected input types

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8772:
---

Assignee: Apache Spark  (was: Reynold Xin)

> Implement implicit type cast for expressions that define expected input types
> -
>
> Key: SPARK-8772
> URL: https://issues.apache.org/jira/browse/SPARK-8772
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> We should have a engine-wide implicit cast rule defined.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-8770) BinaryOperator expression

2015-07-01 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin reopened SPARK-8770:


Reverted the commit since it failed Python tests.


> BinaryOperator expression
> -
>
> Key: SPARK-8770
> URL: https://issues.apache.org/jira/browse/SPARK-8770
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 1.5.0
>
>
> Our current BinaryExpression abstract class is not for generic binary 
> expressions, i.e. it requires left/right children to have the same type. 
> However, due to its name, contributors build new binary expressions that 
> don't have that assumption (e.g. Sha) and still extend BinaryExpression.
> We should create a new BinaryOperator abstract class with this assumption, 
> and update the analyzer to only apply type casting rule there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8770) BinaryOperator expression

2015-07-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611351#comment-14611351
 ] 

Apache Spark commented on SPARK-8770:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/7174

> BinaryOperator expression
> -
>
> Key: SPARK-8770
> URL: https://issues.apache.org/jira/browse/SPARK-8770
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 1.5.0
>
>
> Our current BinaryExpression abstract class is not for generic binary 
> expressions, i.e. it requires left/right children to have the same type. 
> However, due to its name, contributors build new binary expressions that 
> don't have that assumption (e.g. Sha) and still extend BinaryExpression.
> We should create a new BinaryOperator abstract class with this assumption, 
> and update the analyzer to only apply type casting rule there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7483) [MLLib] Using Kryo with FPGrowth fails with an exception

2015-07-01 Thread S (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611347#comment-14611347
 ] 

S commented on SPARK-7483:
--

I encountered the same bug.
Adding 
sparkConf.registerKryoClasses(Array(classOf[ArrayBuffer[String]], 
classOf[ListBuffer[String]]))
seems to fix the problem.

> [MLLib] Using Kryo with FPGrowth fails with an exception
> 
>
> Key: SPARK-7483
> URL: https://issues.apache.org/jira/browse/SPARK-7483
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.3.1
>Reporter: Tomasz Bartczak
>Priority: Minor
>
> When using FPGrowth algorithm with KryoSerializer - Spark fails with
> {code}
> Job aborted due to stage failure: Task 0 in stage 9.0 failed 1 times, most 
> recent failure: Lost task 0.0 in stage 9.0 (TID 16, localhost): 
> com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: 
> Can not set final scala.collection.mutable.ListBuffer field 
> org.apache.spark.mllib.fpm.FPTree$Summary.nodes to 
> scala.collection.mutable.ArrayBuffer
> Serialization trace:
> nodes (org.apache.spark.mllib.fpm.FPTree$Summary)
> org$apache$spark$mllib$fpm$FPTree$$summaries 
> (org.apache.spark.mllib.fpm.FPTree)
> {code}
> This can be easily reproduced in spark codebase by setting 
> {code}
> conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
> {code} and running FPGrowthSuite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-07-01 Thread hujiayin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611328#comment-14611328
 ] 

hujiayin commented on SPARK-5682:
-

The solution relied on hadoop API and maybe downgrade the performance. The AES 
algorithm was used in block data encryption in many case. I think rc4 could be 
used to encode the stream or a simple solution with a authentication header 
could be used.   : )

> Add encrypted shuffle in spark
> --
>
> Key: SPARK-5682
> URL: https://issues.apache.org/jira/browse/SPARK-5682
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: liyunzhang_intel
> Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx, Design Document of Encrypted Spark 
> Shuffle_20150402.docx, Design Document of Encrypted Spark 
> Shuffle_20150506.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. AES  is a specification for 
> the encryption of electronic data. There are 5 common modes in AES. CTR is 
> one of the modes. We use two codec JceAesCtrCryptoCodec and 
> OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
> in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
> provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
> provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we 
> first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7835) Refactor HeartbeatReceiverSuite for coverage and clean up

2015-07-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611315#comment-14611315
 ] 

Apache Spark commented on SPARK-7835:
-

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/7173

> Refactor HeartbeatReceiverSuite for coverage and clean up
> -
>
> Key: SPARK-7835
> URL: https://issues.apache.org/jira/browse/SPARK-7835
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> As of the writing of this description, the existing test suite has a lot of 
> duplicate code and doesn't even cover the most fundamental feature of the 
> HeartbeatReceiver, which is expiring hosts that have not responded in a while.
> https://github.com/apache/spark/blob/31d5d463e76b6611c854c6cf27059fec8198adc9/core/src/test/scala/org/apache/spark/HeartbeatReceiverSuite.scala
> We should rewrite this test suite to increase coverage and decrease duplicate 
> code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8708) MatrixFactorizationModel.predictAll() populates single partition only

2015-07-01 Thread Antony Mayi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611304#comment-14611304
 ] 

Antony Mayi commented on SPARK-8708:


I suspect this issue with all data in single partition is also causing 
following of my problems:

At some point (seems to be random) my whole execution gets stuck after one of 
.predictAll() calls and there is a message in errorlog saying:

bq. 15/07/01 21:54:15 INFO shuffle.ShuffleMemoryManager: Thread 16468 waiting 
for at least 1/2N of shuffle memory pool to be free

I then have to kill the processing. Is there any simple workaround for this?

> MatrixFactorizationModel.predictAll() populates single partition only
> -
>
> Key: SPARK-8708
> URL: https://issues.apache.org/jira/browse/SPARK-8708
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.3.0
>Reporter: Antony Mayi
>
> When using mllib.recommendation.ALS the RDD returned by .predictAll() has all 
> values pushed into single partition despite using quite high parallelism.
> This degrades performance of further processing (I can obviously run 
> .partitionBy()) to balance it but that's still too costly (ie if running 
> .predictAll() in loop for thousands of products) and should be possible to do 
> it rather somehow on the model (automatically)).
> Bellow is an example on tiny sample (same on large dataset):
> {code:title=pyspark}
> >>> r1 = (1, 1, 1.0)
> >>> r2 = (1, 2, 2.0)
> >>> r3 = (2, 1, 2.0)
> >>> r4 = (2, 2, 2.0)
> >>> r5 = (3, 1, 1.0)
> >>> ratings = sc.parallelize([r1, r2, r3, r4, r5], 5)
> >>> ratings.getNumPartitions()
> 5
> >>> users = ratings.map(itemgetter(0)).distinct()
> >>> model = ALS.trainImplicit(ratings, 1, seed=10)
> >>> predictions_for_2 = model.predictAll(users.map(lambda u: (u, 2)))
> >>> predictions_for_2.glom().map(len).collect()
> [0, 0, 3, 0, 0]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8743) Deregister Codahale metrics for streaming when StreamingContext is closed

2015-07-01 Thread Tathagata Das (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611258#comment-14611258
 ] 

Tathagata Das commented on SPARK-8743:
--

I have assigned this to you.

> Deregister Codahale metrics for streaming when StreamingContext is closed 
> --
>
> Key: SPARK-8743
> URL: https://issues.apache.org/jira/browse/SPARK-8743
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 1.4.1
>Reporter: Tathagata Das
>Assignee: Neelesh Srinivas Salian
>  Labels: starter
>
> Currently, when the StreamingContext is closed, the registered metrics are 
> not deregistered. If another streaming context is started, it throws a 
> warning saying that the metrics are already registered. 
> The solution is to deregister the metrics when streamingcontext is stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8743) Deregister Codahale metrics for streaming when StreamingContext is closed

2015-07-01 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-8743:
-
Assignee: Neelesh Srinivas Salian

> Deregister Codahale metrics for streaming when StreamingContext is closed 
> --
>
> Key: SPARK-8743
> URL: https://issues.apache.org/jira/browse/SPARK-8743
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 1.4.1
>Reporter: Tathagata Das
>Assignee: Neelesh Srinivas Salian
>  Labels: starter
>
> Currently, when the StreamingContext is closed, the registered metrics are 
> not deregistered. If another streaming context is started, it throws a 
> warning saying that the metrics are already registered. 
> The solution is to deregister the metrics when streamingcontext is stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8069) Add support for cutoff to RandomForestClassifier

2015-07-01 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611244#comment-14611244
 ] 

Joseph K. Bradley commented on SPARK-8069:
--

I like the idea of including it in an abstraction like ClassificationModel and 
ProbabilisticClassificationModel, unless it is too difficult.  If a developer 
does not want to support thresholds/cutoffs (or wants to modify the API), the 
developer does not have to use the abstraction.

The main difficulty I see is in trying to specify thresholds in a uniform way:
* Thresholding rawPrediction vs. probability: It would be easy to mimic the R 
randomForest package for thresholding probabilities, for which we know which 
values are in the range [0,1].  That won't work well for rawPrediction values, 
which could be negative.
** We could initially only support thresholding for 
ProbabilisticClassificationModel.  I expect to modify trees & tree ensembles to 
subclass ProbabilisticClassificationModel in release 1.5 (WIP).
** Do you have ideas for thresholding for rawPrediction?
* Binary vs. multiclass: It would be nice to think of a way to naturally 
support binary, though it might mean modifying or deprecating HasThreshold.  
Once we decide on a good way to specify thresholds, then perhaps the binary 
case can be handled by providing a setter as in HasThreshold 
({{setThreshold(value: Double)}}) but returning the generalized threshold in 
the getter ({{Vector getThreshold}}).

> Add support for cutoff to RandomForestClassifier
> 
>
> Key: SPARK-8069
> URL: https://issues.apache.org/jira/browse/SPARK-8069
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: holdenk
>Priority: Minor
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> Consider adding support for cutoffs similar to 
> http://cran.r-project.org/web/packages/randomForest/randomForest.pdf 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5222) YARN client and cluster modes have different app name behaviors

2015-07-01 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611241#comment-14611241
 ] 

Andrew Or commented on SPARK-5222:
--

I'm inclined to close this as a "Won't Fix" because the user can set the app 
name in his/her main method, in which case there's nothing we can do because 
cluster mode will not have access to the name until later. I'll let this sit 
for a few more days to see if others have any objections to this resolution.

> YARN client and cluster modes have different app name behaviors
> ---
>
> Key: SPARK-5222
> URL: https://issues.apache.org/jira/browse/SPARK-5222
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Tao Wang
>Priority: Minor
>
> The behavior is summarized in a table produced by [~WangTaoTheTonic] here: 
> https://github.com/apache/spark/pull/3557
> SPARK_YARN_APP_NAME is respected only in client mode but not in cluster mode. 
> This results in the strange behavior where the app name changes if the user 
> runs the same application but uses a different deploy mode from before. We 
> should make sure the app name behavior is consistent across deploy modes 
> regardless of what variable or config is set.
> Additionally, it should be noted that because "spark.app.name" is required of 
> all applications, the setting of "SPARK_YARN_APP_NAME" will not take effect 
> unless we handle it preemptively in Spark submit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8016) YARN cluster / client modes have different app names for python

2015-07-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-8016:
-
Priority: Minor  (was: Major)

> YARN cluster / client modes have different app names for python
> ---
>
> Key: SPARK-8016
> URL: https://issues.apache.org/jira/browse/SPARK-8016
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Priority: Minor
> Attachments: python.png
>
>
> See screenshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5222) YARN client and cluster modes have different app name behaviors

2015-07-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-5222:
-
Priority: Minor  (was: Major)

> YARN client and cluster modes have different app name behaviors
> ---
>
> Key: SPARK-5222
> URL: https://issues.apache.org/jira/browse/SPARK-5222
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Tao Wang
>Priority: Minor
>
> The behavior is summarized in a table produced by [~WangTaoTheTonic] here: 
> https://github.com/apache/spark/pull/3557
> SPARK_YARN_APP_NAME is respected only in client mode but not in cluster mode. 
> This results in the strange behavior where the app name changes if the user 
> runs the same application but uses a different deploy mode from before. We 
> should make sure the app name behavior is consistent across deploy modes 
> regardless of what variable or config is set.
> Additionally, it should be noted that because "spark.app.name" is required of 
> all applications, the setting of "SPARK_YARN_APP_NAME" will not take effect 
> unless we handle it preemptively in Spark submit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8774) Add R model formula with basic support as a transformer

2015-07-01 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng reassigned SPARK-8774:


Assignee: Xiangrui Meng

> Add R model formula with basic support as a transformer
> ---
>
> Key: SPARK-8774
> URL: https://issues.apache.org/jira/browse/SPARK-8774
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SparkR
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> To have better integration with SparkR, we can add a feature transformer to 
> support R formula. A list of operators R supports can be find here: 
> http://ww2.coastal.edu/kingw/statistics/R-tutorials/formulae.html
> The initial version should support "~", "+", and "." on numeric columns and 
> we can expand it in the future.
> {code}
> val formula = new RModelFormula()
>   .setFormula("y ~ x + z")
> {code}
> The output should append two new columns: features and label.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8069) Add support for cutoff to RandomForestClassifier

2015-07-01 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611224#comment-14611224
 ] 

holdenk commented on SPARK-8069:


So I started working on doing this for the randomforestclassifier. I think we 
could maybe add this as a trait for multiclass classification models which 
return all of the scores (but the current API of the multiclass classification 
models seems to just return the winning class). What about about if we added a 
trait to do this and then the other models could implement it as needed? Or I 
could change the base predictionmodel but that would probably be a pretty large 
change.

> Add support for cutoff to RandomForestClassifier
> 
>
> Key: SPARK-8069
> URL: https://issues.apache.org/jira/browse/SPARK-8069
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: holdenk
>Priority: Minor
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> Consider adding support for cutoffs similar to 
> http://cran.r-project.org/web/packages/randomForest/randomForest.pdf 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8774) Add R model formula with basic support as a transformer

2015-07-01 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-8774:


 Summary: Add R model formula with basic support as a transformer
 Key: SPARK-8774
 URL: https://issues.apache.org/jira/browse/SPARK-8774
 Project: Spark
  Issue Type: New Feature
  Components: ML, SparkR
Reporter: Xiangrui Meng


To have better integration with SparkR, we can add a feature transformer to 
support R formula. A list of operators R supports can be find here: 
http://ww2.coastal.edu/kingw/statistics/R-tutorials/formulae.html

The initial version should support "~", "+", and "." on numeric columns and we 
can expand it in the future.

{code}
val formula = new RModelFormula()
  .setFormula("y ~ x + z")
{code}

The output should append two new columns: features and label.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6805) ML Pipeline API in SparkR

2015-07-01 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-6805:
-
Assignee: (was: Xiangrui Meng)

> ML Pipeline API in SparkR
> -
>
> Key: SPARK-6805
> URL: https://issues.apache.org/jira/browse/SPARK-6805
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, SparkR
>Reporter: Xiangrui Meng
>
> SparkR was merged. So let's have this umbrella JIRA for the ML pipeline API 
> in SparkR. The implementation should be similar to the pipeline API 
> implementation in Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-6805) ML Pipeline API in SparkR

2015-07-01 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng reassigned SPARK-6805:


Assignee: Xiangrui Meng

> ML Pipeline API in SparkR
> -
>
> Key: SPARK-6805
> URL: https://issues.apache.org/jira/browse/SPARK-6805
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, SparkR
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>
> SparkR was merged. So let's have this umbrella JIRA for the ML pipeline API 
> in SparkR. The implementation should be similar to the pipeline API 
> implementation in Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-8770) BinaryOperator expression

2015-07-01 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-8770.

   Resolution: Fixed
Fix Version/s: 1.5.0

> BinaryOperator expression
> -
>
> Key: SPARK-8770
> URL: https://issues.apache.org/jira/browse/SPARK-8770
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 1.5.0
>
>
> Our current BinaryExpression abstract class is not for generic binary 
> expressions, i.e. it requires left/right children to have the same type. 
> However, due to its name, contributors build new binary expressions that 
> don't have that assumption (e.g. Sha) and still extend BinaryExpression.
> We should create a new BinaryOperator abstract class with this assumption, 
> and update the analyzer to only apply type casting rule there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8773) Throw type mismatch in check analysis for expressions with expected input types defined

2015-07-01 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-8773:
--

 Summary: Throw type mismatch in check analysis for expressions 
with expected input types defined
 Key: SPARK-8773
 URL: https://issues.apache.org/jira/browse/SPARK-8773
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8772) Implement implicit type cast for expressions that define expected input types

2015-07-01 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-8772:
--

 Summary: Implement implicit type cast for expressions that define 
expected input types
 Key: SPARK-8772
 URL: https://issues.apache.org/jira/browse/SPARK-8772
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin


We should have a engine-wide implicit cast rule defined.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-8766) DataFrame Python API should work with column which has non-ascii character in it

2015-07-01 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-8766.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7165
[https://github.com/apache/spark/pull/7165]

> DataFrame Python API should work with column which has non-ascii character in 
> it
> 
>
> Key: SPARK-8766
> URL: https://issues.apache.org/jira/browse/SPARK-8766
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6707) Mesos Scheduler should allow the user to specify constraints based on slave attributes

2015-07-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-6707:
-
Target Version/s: 1.5.0

> Mesos Scheduler should allow the user to specify constraints based on slave 
> attributes
> --
>
> Key: SPARK-6707
> URL: https://issues.apache.org/jira/browse/SPARK-6707
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, Scheduler
>Affects Versions: 1.3.0
>Reporter: Ankur Chauhan
>  Labels: mesos, scheduler
>
> Currently, the mesos scheduler only looks at the 'cpu' and 'mem' resources 
> when trying to determine the usablility of a resource offer from a mesos 
> slave node. It may be preferable for the user to be able to ensure that the 
> spark jobs are only started on a certain set of nodes (based on attributes). 
> For example, If the user sets a property, let's say 
> {code}spark.mesos.constraints{code} is set to 
> {code}tachyon=true;us-east-1=false{code}, then the resource offers will be 
> checked to see if they meet both these constraints and only then will be 
> accepted to start new executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7079) Cache-aware external sort

2015-07-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611141#comment-14611141
 ] 

Apache Spark commented on SPARK-7079:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/6444

> Cache-aware external sort
> -
>
> Key: SPARK-7079
> URL: https://issues.apache.org/jira/browse/SPARK-7079
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle, Spark Core
>Reporter: Reynold Xin
>Assignee: Josh Rosen
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7078) Cache-aware binary processing in-memory sort

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7078:
---

Assignee: Apache Spark  (was: Josh Rosen)

> Cache-aware binary processing in-memory sort
> 
>
> Key: SPARK-7078
> URL: https://issues.apache.org/jira/browse/SPARK-7078
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> A cache-friendly sort algorithm that can be used eventually for:
> * sort-merge join
> * shuffle
> See the old alpha sort paper: 
> http://research.microsoft.com/pubs/68249/alphasort.doc
> Note that state-of-the-art for sorting has improved quite a bit, but we can 
> easily optimize the sorting algorithm itself later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7078) Cache-aware binary processing in-memory sort

2015-07-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611140#comment-14611140
 ] 

Apache Spark commented on SPARK-7078:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/6444

> Cache-aware binary processing in-memory sort
> 
>
> Key: SPARK-7078
> URL: https://issues.apache.org/jira/browse/SPARK-7078
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: Reynold Xin
>Assignee: Josh Rosen
>
> A cache-friendly sort algorithm that can be used eventually for:
> * sort-merge join
> * shuffle
> See the old alpha sort paper: 
> http://research.microsoft.com/pubs/68249/alphasort.doc
> Note that state-of-the-art for sorting has improved quite a bit, but we can 
> easily optimize the sorting algorithm itself later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7079) Cache-aware external sort

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7079:
---

Assignee: Josh Rosen  (was: Apache Spark)

> Cache-aware external sort
> -
>
> Key: SPARK-7079
> URL: https://issues.apache.org/jira/browse/SPARK-7079
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle, Spark Core
>Reporter: Reynold Xin
>Assignee: Josh Rosen
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7079) Cache-aware external sort

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7079:
---

Assignee: Apache Spark  (was: Josh Rosen)

> Cache-aware external sort
> -
>
> Key: SPARK-7079
> URL: https://issues.apache.org/jira/browse/SPARK-7079
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle, Spark Core
>Reporter: Reynold Xin
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-7078) Cache-aware binary processing in-memory sort

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7078:
---

Assignee: Josh Rosen  (was: Apache Spark)

> Cache-aware binary processing in-memory sort
> 
>
> Key: SPARK-7078
> URL: https://issues.apache.org/jira/browse/SPARK-7078
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: Reynold Xin
>Assignee: Josh Rosen
>
> A cache-friendly sort algorithm that can be used eventually for:
> * sort-merge join
> * shuffle
> See the old alpha sort paper: 
> http://research.microsoft.com/pubs/68249/alphasort.doc
> Note that state-of-the-art for sorting has improved quite a bit, but we can 
> easily optimize the sorting algorithm itself later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8771) Actor system deprecation tag uses deprecated deprecation tag

2015-07-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1460#comment-1460
 ] 

Apache Spark commented on SPARK-8771:
-

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/7172

> Actor system deprecation tag uses deprecated deprecation tag
> 
>
> Key: SPARK-8771
> URL: https://issues.apache.org/jira/browse/SPARK-8771
> Project: Spark
>  Issue Type: Improvement
>Reporter: holdenk
>Priority: Trivial
>
> The deprecation of the actor system adds a spurious build warning:
> {quote}
> @deprecated now takes two arguments; see the scaladoc.
> [warn]   @deprecated("Actor system is no longer supported as of 1.4")
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8771) Actor system deprecation tag uses deprecated deprecation tag

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8771:
---

Assignee: (was: Apache Spark)

> Actor system deprecation tag uses deprecated deprecation tag
> 
>
> Key: SPARK-8771
> URL: https://issues.apache.org/jira/browse/SPARK-8771
> Project: Spark
>  Issue Type: Improvement
>Reporter: holdenk
>Priority: Trivial
>
> The deprecation of the actor system adds a spurious build warning:
> {quote}
> @deprecated now takes two arguments; see the scaladoc.
> [warn]   @deprecated("Actor system is no longer supported as of 1.4")
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8771) Actor system deprecation tag uses deprecated deprecation tag

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8771:
---

Assignee: Apache Spark

> Actor system deprecation tag uses deprecated deprecation tag
> 
>
> Key: SPARK-8771
> URL: https://issues.apache.org/jira/browse/SPARK-8771
> Project: Spark
>  Issue Type: Improvement
>Reporter: holdenk
>Assignee: Apache Spark
>Priority: Trivial
>
> The deprecation of the actor system adds a spurious build warning:
> {quote}
> @deprecated now takes two arguments; see the scaladoc.
> [warn]   @deprecated("Actor system is no longer supported as of 1.4")
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8769) toLocalIterator should mention it results in many jobs

2015-07-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611107#comment-14611107
 ] 

Apache Spark commented on SPARK-8769:
-

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/7171

> toLocalIterator should mention it results in many jobs
> --
>
> Key: SPARK-8769
> URL: https://issues.apache.org/jira/browse/SPARK-8769
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: holdenk
>Priority: Trivial
>
> toLocalIterator on RDDs should mention that it results in mutliple jobs, and 
> that to avoid re-computing, if the input was the result of a 
> wide-transformation, the input should be cached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8769) toLocalIterator should mention it results in many jobs

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8769:
---

Assignee: (was: Apache Spark)

> toLocalIterator should mention it results in many jobs
> --
>
> Key: SPARK-8769
> URL: https://issues.apache.org/jira/browse/SPARK-8769
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: holdenk
>Priority: Trivial
>
> toLocalIterator on RDDs should mention that it results in mutliple jobs, and 
> that to avoid re-computing, if the input was the result of a 
> wide-transformation, the input should be cached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8769) toLocalIterator should mention it results in many jobs

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8769:
---

Assignee: Apache Spark

> toLocalIterator should mention it results in many jobs
> --
>
> Key: SPARK-8769
> URL: https://issues.apache.org/jira/browse/SPARK-8769
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: holdenk
>Assignee: Apache Spark
>Priority: Trivial
>
> toLocalIterator on RDDs should mention that it results in mutliple jobs, and 
> that to avoid re-computing, if the input was the result of a 
> wide-transformation, the input should be cached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8770) BinaryOperator expression

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8770:
---

Assignee: Reynold Xin  (was: Apache Spark)

> BinaryOperator expression
> -
>
> Key: SPARK-8770
> URL: https://issues.apache.org/jira/browse/SPARK-8770
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> Our current BinaryExpression abstract class is not for generic binary 
> expressions, i.e. it requires left/right children to have the same type. 
> However, due to its name, contributors build new binary expressions that 
> don't have that assumption (e.g. Sha) and still extend BinaryExpression.
> We should create a new BinaryOperator abstract class with this assumption, 
> and update the analyzer to only apply type casting rule there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8770) BinaryOperator expression

2015-07-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8770:
---

Assignee: Apache Spark  (was: Reynold Xin)

> BinaryOperator expression
> -
>
> Key: SPARK-8770
> URL: https://issues.apache.org/jira/browse/SPARK-8770
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> Our current BinaryExpression abstract class is not for generic binary 
> expressions, i.e. it requires left/right children to have the same type. 
> However, due to its name, contributors build new binary expressions that 
> don't have that assumption (e.g. Sha) and still extend BinaryExpression.
> We should create a new BinaryOperator abstract class with this assumption, 
> and update the analyzer to only apply type casting rule there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8770) BinaryOperator expression

2015-07-01 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-8770:

Shepherd: Michael Armbrust

> BinaryOperator expression
> -
>
> Key: SPARK-8770
> URL: https://issues.apache.org/jira/browse/SPARK-8770
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> Our current BinaryExpression abstract class is not for generic binary 
> expressions, i.e. it requires left/right children to have the same type. 
> However, due to its name, contributors build new binary expressions that 
> don't have that assumption (e.g. Sha) and still extend BinaryExpression.
> We should create a new BinaryOperator abstract class with this assumption, 
> and update the analyzer to only apply type casting rule there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8770) BinaryOperator expression

2015-07-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611101#comment-14611101
 ] 

Apache Spark commented on SPARK-8770:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/7170

> BinaryOperator expression
> -
>
> Key: SPARK-8770
> URL: https://issues.apache.org/jira/browse/SPARK-8770
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> Our current BinaryExpression abstract class is not for generic binary 
> expressions, i.e. it requires left/right children to have the same type. 
> However, due to its name, contributors build new binary expressions that 
> don't have that assumption (e.g. Sha) and still extend BinaryExpression.
> We should create a new BinaryOperator abstract class with this assumption, 
> and update the analyzer to only apply type casting rule there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8770) BinaryOperator expression

2015-07-01 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-8770:
--

 Summary: BinaryOperator expression
 Key: SPARK-8770
 URL: https://issues.apache.org/jira/browse/SPARK-8770
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin


Our current BinaryExpression abstract class is not for generic binary 
expressions, i.e. it requires left/right children to have the same type. 
However, due to its name, contributors build new binary expressions that don't 
have that assumption (e.g. Sha) and still extend BinaryExpression.

We should create a new BinaryOperator abstract class with this assumption, and 
update the analyzer to only apply type casting rule there.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8771) Actor system deprecation tag uses deprecated deprecation tag

2015-07-01 Thread holdenk (JIRA)
holdenk created SPARK-8771:
--

 Summary: Actor system deprecation tag uses deprecated deprecation 
tag
 Key: SPARK-8771
 URL: https://issues.apache.org/jira/browse/SPARK-8771
 Project: Spark
  Issue Type: Improvement
Reporter: holdenk
Priority: Trivial


The deprecation of the actor system adds a spurious build warning:
{quote}
@deprecated now takes two arguments; see the scaladoc.
[warn]   @deprecated("Actor system is no longer supported as of 1.4")
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8766) DataFrame Python API should work with column which has non-ascii character in it

2015-07-01 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-8766:
---
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-6116

> DataFrame Python API should work with column which has non-ascii character in 
> it
> 
>
> Key: SPARK-8766
> URL: https://issues.apache.org/jira/browse/SPARK-8766
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Davies Liu
>Assignee: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8769) toLocalIterator should mention it results in many jobs

2015-07-01 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8769:
-
Component/s: Documentation

> toLocalIterator should mention it results in many jobs
> --
>
> Key: SPARK-8769
> URL: https://issues.apache.org/jira/browse/SPARK-8769
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: holdenk
>Priority: Trivial
>
> toLocalIterator on RDDs should mention that it results in mutliple jobs, and 
> that to avoid re-computing, if the input was the result of a 
> wide-transformation, the input should be cached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8766) DataFrame Python API should work with column which has non-ascii character in it

2015-07-01 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8766:
-
Component/s: PySpark

> DataFrame Python API should work with column which has non-ascii character in 
> it
> 
>
> Key: SPARK-8766
> URL: https://issues.apache.org/jira/browse/SPARK-8766
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Davies Liu
>Assignee: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8753) Create an IntervalType data type

2015-07-01 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611067#comment-14611067
 ] 

holdenk commented on SPARK-8753:


I could give this a shot if people want :)

> Create an IntervalType data type
> 
>
> Key: SPARK-8753
> URL: https://issues.apache.org/jira/browse/SPARK-8753
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> We should create an IntervalType data type that represents time intervals. 
> Internally, we can use a long value to store it, similar to Timestamp (i.e. 
> 100ns precision). This data type initially cannot be stored externally, but 
> only used for expressions.
> 1. Add IntervalType data type.
> 2. Add parser support in our SQL expression, in the form of
> {code}
> INTERVAL [number] [unit] 
> {code}
> unit can be YEAR[S], MONTH[S], WEEK[S], DAY[S], HOUR[S], MINUTE[S], 
> SECOND[S], MILLISECOND[S], MICROSECOND[S], or NANOSECOND[S].
> 3. Add in the analyzer to make sure we throw some exception to prevent saving 
> a dataframe/table with IntervalType out to external systems.
> Related Hive ticket: https://issues.apache.org/jira/browse/HIVE-9792



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8763) executing run-tests.py with Python 2.6 fails with absence of subprocess.check_output function

2015-07-01 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8763:
-
Assignee: Tomohiko K.

> executing run-tests.py with Python 2.6 fails with absence of 
> subprocess.check_output function
> -
>
> Key: SPARK-8763
> URL: https://issues.apache.org/jira/browse/SPARK-8763
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.0
> Environment: Mac OS X 10.10.3 Python 2.6.9 Java 1.8.0 
>Reporter: Tomohiko K.
>Assignee: Tomohiko K.
>  Labels: pyspark, testing
> Fix For: 1.5.0
>
>
> Running run-tests.py with Python 2.6 cause following error:
> {noformat}
> Running PySpark tests. Output is in 
> python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log
> Will test against the following Python executables: ['python2.6', 
> 'python3.4', 'pypy']
> Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 
> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
> Traceback (most recent call last):
>   File "./python/run-tests.py", line 196, in 
> main()
>   File "./python/run-tests.py", line 159, in main
> python_implementation = subprocess.check_output(
> AttributeError: 'module' object has no attribute 'check_output'
> ...
> {noformat}
> The cause of this error is using subprocess.check_output function, which 
> exists since Python 2.7.
> (ref. 
> https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >