[jira] [Updated] (SPARK-1441) compile Spark Core error with Hadoop 0.23.x

2014-04-07 Thread witgo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

witgo updated SPARK-1441:
-

Summary: compile Spark Core error with Hadoop 0.23.x  (was: Spark Core 
build error with Hadoop 0.23.x)

> compile Spark Core error with Hadoop 0.23.x
> ---
>
> Key: SPARK-1441
> URL: https://issues.apache.org/jira/browse/SPARK-1441
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.0.0
>Reporter: witgo
> Attachments: mvn.log, sbt.log
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1428) MLlib should convert non-float64 NumPy arrays to float64 instead of complaining

2014-04-07 Thread Sandeep Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962636#comment-13962636
 ] 

Sandeep Singh commented on SPARK-1428:
--

this should work https://github.com/apache/spark/pull/356

> MLlib should convert non-float64 NumPy arrays to float64 instead of 
> complaining
> ---
>
> Key: SPARK-1428
> URL: https://issues.apache.org/jira/browse/SPARK-1428
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Reporter: Matei Zaharia
>Priority: Minor
>  Labels: Starter
>
> Pretty easy to fix, it would avoid spewing some scary task-failed errors. The 
> place to fix this is _serialize_double_vector in _common.py.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1441) Spark Core build error with Hadoop 0.23.x

2014-04-07 Thread witgo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

witgo updated SPARK-1441:
-

Summary: Spark Core build error with Hadoop 0.23.x  (was: Spark Core with 
Hadoop 0.23.X error)

> Spark Core build error with Hadoop 0.23.x
> -
>
> Key: SPARK-1441
> URL: https://issues.apache.org/jira/browse/SPARK-1441
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.0.0
>Reporter: witgo
> Attachments: mvn.log, sbt.log
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1103) Garbage collect RDD information inside of Spark

2014-04-07 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1103.


Resolution: Fixed

> Garbage collect RDD information inside of Spark
> ---
>
> Key: SPARK-1103
> URL: https://issues.apache.org/jira/browse/SPARK-1103
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Tathagata Das
>Priority: Blocker
> Fix For: 1.0.0
>
>
> When Spark jobs run for a long period of time, state accumulates. This is 
> dealt with now using TTL-based cleaning. Instead we should do proper garbage 
> collection using weak references.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1436) Compression code broke in-memory store

2014-04-07 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962615#comment-13962615
 ] 

Cheng Lian commented on SPARK-1436:
---

Fixed in [this 
commit|https://github.com/liancheng/spark/commit/1d037b83191099da961c247a57ef686cb508c447]
 of PR [#330|https://github.com/apache/spark/pull/330]

> Compression code broke in-memory store
> --
>
> Key: SPARK-1436
> URL: https://issues.apache.org/jira/browse/SPARK-1436
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.0
>Reporter: Reynold Xin
>Assignee: Cheng Lian
>Priority: Blocker
> Fix For: 1.0.0
>
>
> See my following comment...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1433) Upgrade Mesos dependency to 0.17.0

2014-04-07 Thread Sandeep Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962601#comment-13962601
 ] 

Sandeep Singh commented on SPARK-1433:
--

Pull request https://github.com/apache/spark/pull/355

> Upgrade Mesos dependency to 0.17.0
> --
>
> Key: SPARK-1433
> URL: https://issues.apache.org/jira/browse/SPARK-1433
> Project: Spark
>  Issue Type: Task
>Reporter: Sandeep Singh
>Priority: Minor
>
> Mesos 0.13.0 was released 6 months ago.
> Upgrade Mesos dependency to 0.17.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1415) Add a minSplits parameter to wholeTextFiles

2014-04-07 Thread Matei Zaharia (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962585#comment-13962585
 ] 

Matei Zaharia commented on SPARK-1415:
--

Hey Xusen, that makes sense. I think that for consistency with our other API 
methods, we should add minSplits here, and we can compute maxSplitSize from it. 
Later on we can have versions of the methods that take a maxSplitSize. But on 
the old Hadoop API for example we can't easily change this, and it seems that a 
maxSplitSize is always possible to compute from minSplits.

> Add a minSplits parameter to wholeTextFiles
> ---
>
> Key: SPARK-1415
> URL: https://issues.apache.org/jira/browse/SPARK-1415
> Project: Spark
>  Issue Type: Bug
>Reporter: Matei Zaharia
>Assignee: Xusen Yin
>  Labels: Starter
>
> This probably requires adding one to newAPIHadoopFile too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1433) Upgrade Mesos dependency to 0.17.0

2014-04-07 Thread Sandeep Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962581#comment-13962581
 ] 

Sandeep Singh commented on SPARK-1433:
--

[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM .. SUCCESS [2.002s]
[INFO] Spark Project Core  SUCCESS [30.635s]
[INFO] Spark Project Bagel ... SUCCESS [0.883s]
[INFO] Spark Project GraphX .. SUCCESS [0.829s]
[INFO] Spark Project ML Library .. SUCCESS [0.805s]
[INFO] Spark Project Streaming ... SUCCESS [0.911s]
[INFO] Spark Project Tools ... SUCCESS [0.645s]
[INFO] Spark Project Catalyst  SUCCESS [0.897s]
[INFO] Spark Project SQL . SUCCESS [1.193s]
[INFO] Spark Project Hive  SUCCESS [1.541s]
[INFO] Spark Project REPL  SUCCESS [1.164s]
[INFO] Spark Project Assembly  SUCCESS [1.729s]
[INFO] Spark Project External Twitter  SUCCESS [0.809s]
[INFO] Spark Project External Kafka .. SUCCESS [0.591s]
[INFO] Spark Project External Flume .. SUCCESS [0.696s]
[INFO] Spark Project External ZeroMQ . SUCCESS [0.484s]
[INFO] Spark Project External MQTT ... SUCCESS [0.543s]
[INFO] Spark Project Examples  SUCCESS [2.385s]

> Upgrade Mesos dependency to 0.17.0
> --
>
> Key: SPARK-1433
> URL: https://issues.apache.org/jira/browse/SPARK-1433
> Project: Spark
>  Issue Type: Task
>Reporter: Sandeep Singh
>Priority: Minor
>
> Mesos 0.13.0 was released 6 months ago.
> Upgrade Mesos dependency to 0.17.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1391) BlockManager cannot transfer blocks larger than 2G in size

2014-04-07 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962563#comment-13962563
 ] 

Shivaram Venkataraman commented on SPARK-1391:
--

Sorry didn't get a chance to try this yet. Will try to do it tomorrow

> BlockManager cannot transfer blocks larger than 2G in size
> --
>
> Key: SPARK-1391
> URL: https://issues.apache.org/jira/browse/SPARK-1391
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Shuffle
>Affects Versions: 1.0.0
>Reporter: Shivaram Venkataraman
>Assignee: Min Zhou
> Attachments: SPARK-1391.diff
>
>
> If a task tries to remotely access a cached RDD block, I get an exception 
> when the block size is > 2G. The exception is pasted below.
> Memory capacities are huge these days (> 60G), and many workflows depend on 
> having large blocks in memory, so it would be good to fix this bug.
> I don't know if the same thing happens on shuffles if one transfer (from 
> mapper to reducer) is > 2G.
> {noformat}
> 14/04/02 02:33:10 ERROR storage.BlockManagerWorker: Exception handling buffer 
> message
> java.lang.ArrayIndexOutOfBoundsException
> at 
> it.unimi.dsi.fastutil.io.FastByteArrayOutputStream.write(FastByteArrayOutputStream.java:96)
> at 
> it.unimi.dsi.fastutil.io.FastBufferedOutputStream.dumpBuffer(FastBufferedOutputStream.java:134)
> at 
> it.unimi.dsi.fastutil.io.FastBufferedOutputStream.write(FastBufferedOutputStream.java:164)
> at 
> java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
> at 
> java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
> at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:38)
> at 
> org.apache.spark.serializer.SerializationStream$class.writeAll(Serializer.scala:93)
> at 
> org.apache.spark.serializer.JavaSerializationStream.writeAll(JavaSerializer.scala:26)
> at 
> org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:913)
> at 
> org.apache.spark.storage.BlockManager.dataSerialize(BlockManager.scala:922)
> at 
> org.apache.spark.storage.MemoryStore.getBytes(MemoryStore.scala:102)
> at 
> org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:348)
> at 
> org.apache.spark.storage.BlockManager.getLocalBytes(BlockManager.scala:323)
> at 
> org.apache.spark.storage.BlockManagerWorker.getBlock(BlockManagerWorker.scala:90)
> at 
> org.apache.spark.storage.BlockManagerWorker.processBlockMessage(BlockManagerWorker.scala:69)
> at 
> org.apache.spark.storage.BlockManagerWorker$$anonfun$2.apply(BlockManagerWorker.scala:44)
> at 
> org.apache.spark.storage.BlockManagerWorker$$anonfun$2.apply(BlockManagerWorker.scala:44)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at 
> org.apache.spark.storage.BlockMessageArray.foreach(BlockMessageArray.scala:28)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at 
> org.apache.spark.storage.BlockMessageArray.map(BlockMessageArray.scala:28)
> at 
> org.apache.spark.storage.BlockManagerWorker.onBlockMessageReceive(BlockManagerWorker.scala:44)
> at 
> org.apache.spark.storage.BlockManagerWorker$$anonfun$1.apply(BlockManagerWorker.scala:34)
> at 
> org.apache.spark.storage.BlockManagerWorker$$anonfun$1.apply(BlockManagerWorker.scala:34)
> at 
> org.apache.spark.network.ConnectionManager.org$apache$spark$network$ConnectionManager$$handleMessage(ConnectionManager.scala:661)
> at 
> org.apache.spark.network.ConnectionManager$$anon$9.run(ConnectionManager.scala:503)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1441) Spark Core with Hadoop 0.23.X error

2014-04-07 Thread witgo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

witgo updated SPARK-1441:
-

Attachment: mvn.log
sbt.log

{code}
./make-distribution.sh --hadoop 0.23.9  > sbt.log
mvn -Dhadoop.version=0.23.9 -DskipTests package -X > mvn.log
{code}

> Spark Core with Hadoop 0.23.X error
> ---
>
> Key: SPARK-1441
> URL: https://issues.apache.org/jira/browse/SPARK-1441
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.0.0
>Reporter: witgo
> Attachments: mvn.log, sbt.log
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1441) Spark Core with Hadoop 0.23.X error

2014-04-07 Thread witgo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

witgo updated SPARK-1441:
-

Summary: Spark Core with Hadoop 0.23.X error  (was:  build with Hadoop 
0.23.X  error)

> Spark Core with Hadoop 0.23.X error
> ---
>
> Key: SPARK-1441
> URL: https://issues.apache.org/jira/browse/SPARK-1441
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.0.0
>Reporter: witgo
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1441) build with Hadoop 0.23.X error

2014-04-07 Thread witgo (JIRA)
witgo created SPARK-1441:


 Summary:  build with Hadoop 0.23.X  error
 Key: SPARK-1441
 URL: https://issues.apache.org/jira/browse/SPARK-1441
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.0.0
Reporter: witgo






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1433) Upgrade Mesos dependency to 0.17.0

2014-04-07 Thread Sandeep Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-1433:
-

Description: 
Mesos 0.13.0 was released 6 months ago.
Upgrade Mesos dependency to 0.17.0

  was:
Mesos 0.14.0 was released 6 months ago.
Upgrade Mesos dependency to 0.17.0


> Upgrade Mesos dependency to 0.17.0
> --
>
> Key: SPARK-1433
> URL: https://issues.apache.org/jira/browse/SPARK-1433
> Project: Spark
>  Issue Type: Task
>Reporter: Sandeep Singh
>Priority: Minor
>
> Mesos 0.13.0 was released 6 months ago.
> Upgrade Mesos dependency to 0.17.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1424) InsertInto should work on JavaSchemaRDD as well.

2014-04-07 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962533#comment-13962533
 ] 

Michael Armbrust commented on SPARK-1424:
-

Started on this here: https://github.com/apache/spark/pull/354

A few things, there is no way to createTableAs from a standard sql context as 
I'm not really sure where to put the files.

Also, it might be nice to have a create new table that doesn't fail if it 
exists, but instead appends to it.  This is going to require some minor 
tweaking in the execution engine though, where as the above options were just 
API extensions.

> InsertInto should work on JavaSchemaRDD as well.
> 
>
> Key: SPARK-1424
> URL: https://issues.apache.org/jira/browse/SPARK-1424
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.0.0
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (SPARK-1436) Compression code broke in-memory store

2014-04-07 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962517#comment-13962517
 ] 

Cheng Lian edited comment on SPARK-1436 at 4/8/14 2:10 AM:
---

Sorry, forgot to duplicate the in-memory column byte buffer when creating new 
{{ColumnAccessor}}'s, so that when the column byte buffer is accessed multiple 
times, the position is not reset to 0. Will fix this in PR 
[#330|https://github.com/apache/spark/pull/330] with regression test.


was (Author: lian cheng):
Sorry, forgot to duplicate the in-memory column byte buffer when creating new 
{{ColumnAccessor}}s, so that when the column byte buffer is accessed multiple 
times, the position is not reset to 0. Will fix this in PR 
[#330|https://github.com/apache/spark/pull/330] with regression test.

> Compression code broke in-memory store
> --
>
> Key: SPARK-1436
> URL: https://issues.apache.org/jira/browse/SPARK-1436
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.0
>Reporter: Reynold Xin
>Assignee: Cheng Lian
>Priority: Blocker
> Fix For: 1.0.0
>
>
> See my following comment...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1436) Compression code broke in-memory store

2014-04-07 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962517#comment-13962517
 ] 

Cheng Lian commented on SPARK-1436:
---

Sorry, forgot to duplicate the in-memory column byte buffer when creating new 
{{ColumnAccessor}}s, so that when the column byte buffer is accessed multiple 
times, the position is not reset to 0. Will fix this in PR 
[#330|https://github.com/apache/spark/pull/330] with regression test.

> Compression code broke in-memory store
> --
>
> Key: SPARK-1436
> URL: https://issues.apache.org/jira/browse/SPARK-1436
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.0
>Reporter: Reynold Xin
>Assignee: Cheng Lian
>Priority: Blocker
> Fix For: 1.0.0
>
>
> See my following comment...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1439) Aggregate Scaladocs across projects

2014-04-07 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-1439:


 Summary: Aggregate Scaladocs across projects
 Key: SPARK-1439
 URL: https://issues.apache.org/jira/browse/SPARK-1439
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Reporter: Matei Zaharia
 Fix For: 1.0.0


Apparently there's a "Unidoc" plugin to put together ScalaDocs across modules: 
https://github.com/akka/akka/blob/master/project/Unidoc.scala



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1440) Generate JavaDoc instead of ScalaDoc for Java API

2014-04-07 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-1440:


 Summary: Generate JavaDoc instead of ScalaDoc for Java API
 Key: SPARK-1440
 URL: https://issues.apache.org/jira/browse/SPARK-1440
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Reporter: Matei Zaharia
 Fix For: 1.0.0


It may be possible to use this plugin:  
https://github.com/typesafehub/genjavadoc



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1351) Documentation Improvements for Spark 1.0

2014-04-07 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1351:
---

Description: 
Umbrella to track necessary doc improvements. We can break these out into other 
JIRA's over time.

- Use grouping in the RDD and SparkContext scaladocs. See Schema RDD:
http://people.apache.org/~pwendell/catalyst-docs/api/sql/core/index.html#org.apache.spark.sql.SchemaRDD
- Use spark-submit script wherever possible in docs.
- Have package-level documentation in Scaladoc. Also these can be grouped so 
that the o.a.s package doc looks nice.

  was:
Umbrella to track necessary doc improvements. We can break these out into other 
JIRA's over time.

- Use grouping in the RDD and SparkContext scaladocs. See Schema RDD:
http://people.apache.org/~pwendell/catalyst-docs/api/sql/core/index.html#org.apache.spark.sql.SchemaRDD
- Use spark-submit script wherever possible in docs.
- Have package-level documentation in Scaladoc.


> Documentation Improvements for Spark 1.0
> 
>
> Key: SPARK-1351
> URL: https://issues.apache.org/jira/browse/SPARK-1351
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Patrick Wendell
>Priority: Critical
> Fix For: 1.0.0
>
>
> Umbrella to track necessary doc improvements. We can break these out into 
> other JIRA's over time.
> - Use grouping in the RDD and SparkContext scaladocs. See Schema RDD:
> http://people.apache.org/~pwendell/catalyst-docs/api/sql/core/index.html#org.apache.spark.sql.SchemaRDD
> - Use spark-submit script wherever possible in docs.
> - Have package-level documentation in Scaladoc. Also these can be grouped so 
> that the o.a.s package doc looks nice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1099) Allow inferring number of cores with local[*]

2014-04-07 Thread Aaron Davidson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Davidson resolved SPARK-1099.
---

Resolution: Fixed

> Allow inferring number of cores with local[*]
> -
>
> Key: SPARK-1099
> URL: https://issues.apache.org/jira/browse/SPARK-1099
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Reporter: Aaron Davidson
>Assignee: Aaron Davidson
>Priority: Minor
> Fix For: 1.0.0
>
>
> It seems reasonable that the default number of cores used by spark's local 
> mode (when no value is specified) is drawn from the spark.cores.max 
> configuration parameter (which, conveniently, is now settable as a 
> command-line option in spark-shell).
> For the sake of consistency, it's probable that this change would also entail 
> making the default number of cores when spark.cores.max is NOT specified to 
> be as many logical cores are on the machine (which is what standalone mode 
> does). This too seems reasonable, as Spark is inherently a distributed system 
> and I think it's expected that it should use multiple cores by default. 
> However, it is a behavioral change, and thus requires caution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1430) Support sparse data in Python MLlib

2014-04-07 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-1430:
-

Fix Version/s: 1.0.0

> Support sparse data in Python MLlib
> ---
>
> Key: SPARK-1430
> URL: https://issues.apache.org/jira/browse/SPARK-1430
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Reporter: Matei Zaharia
>Assignee: Matei Zaharia
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1438) Update RDD.sample() API to make seed parameter optional

2014-04-07 Thread Matei Zaharia (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-1438:
-

Fix Version/s: 1.0.0

> Update RDD.sample() API to make seed parameter optional
> ---
>
> Key: SPARK-1438
> URL: https://issues.apache.org/jira/browse/SPARK-1438
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Matei Zaharia
>Priority: Blocker
>  Labels: Starter
> Fix For: 1.0.0
>
>
> When a seed is not given, it should pick one based on Math.random().
> This needs to be done in Java and Python as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1438) Update RDD.sample() API to make seed parameter optional

2014-04-07 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-1438:


 Summary: Update RDD.sample() API to make seed parameter optional
 Key: SPARK-1438
 URL: https://issues.apache.org/jira/browse/SPARK-1438
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Matei Zaharia
Priority: Blocker


When a seed is not given, it should pick one based on Math.random().

This needs to be done in Java and Python as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1437) Jenkins should build with Java 6

2014-04-07 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-1437:
-

Attachment: Screen Shot 2014-04-07 at 22.53.56.png

> Jenkins should build with Java 6
> 
>
> Key: SPARK-1437
> URL: https://issues.apache.org/jira/browse/SPARK-1437
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 0.9.0
>Reporter: Sean Owen
>Priority: Minor
>  Labels: javac, jenkins
> Attachments: Screen Shot 2014-04-07 at 22.53.56.png
>
>
> Apologies if this was already on someone's to-do list, but I wanted to track 
> this, as it bit two commits in the last few weeks.
> Spark is intended to work with Java 6, and so compiles with source/target 
> 1.6. Java 7 can correctly enforce Java 6 language rules and emit Java 6 
> bytecode. However, unless otherwise configured with -bootclasspath, javac 
> will use its own (Java 7) library classes. This means code that uses classes 
> in Java 7 will be allowed to compile, but the result will fail when run on 
> Java 6.
> This is why you get warnings like ...
> Using /usr/java/jdk1.7.0_51 as default JAVA_HOME.
> ...
> [warn] warning: [options] bootstrap class path not set in conjunction with 
> -source 1.6
> The solution is just to tell Jenkins to use Java 6. This may be stating the 
> obvious, but it should just be a setting under "Configure" for 
> SparkPullRequestBuilder. In our Jenkinses, JDK 6/7/8 are set up; if it's not 
> an option already I'm guessing it's not too hard to get Java 6 configured on 
> the Amplab machines.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1391) BlockManager cannot transfer blocks larger than 2G in size

2014-04-07 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962319#comment-13962319
 ] 

Min Zhou commented on SPARK-1391:
-

Any update on your test , [~shivaram] ?

> BlockManager cannot transfer blocks larger than 2G in size
> --
>
> Key: SPARK-1391
> URL: https://issues.apache.org/jira/browse/SPARK-1391
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Shuffle
>Affects Versions: 1.0.0
>Reporter: Shivaram Venkataraman
>Assignee: Min Zhou
> Attachments: SPARK-1391.diff
>
>
> If a task tries to remotely access a cached RDD block, I get an exception 
> when the block size is > 2G. The exception is pasted below.
> Memory capacities are huge these days (> 60G), and many workflows depend on 
> having large blocks in memory, so it would be good to fix this bug.
> I don't know if the same thing happens on shuffles if one transfer (from 
> mapper to reducer) is > 2G.
> {noformat}
> 14/04/02 02:33:10 ERROR storage.BlockManagerWorker: Exception handling buffer 
> message
> java.lang.ArrayIndexOutOfBoundsException
> at 
> it.unimi.dsi.fastutil.io.FastByteArrayOutputStream.write(FastByteArrayOutputStream.java:96)
> at 
> it.unimi.dsi.fastutil.io.FastBufferedOutputStream.dumpBuffer(FastBufferedOutputStream.java:134)
> at 
> it.unimi.dsi.fastutil.io.FastBufferedOutputStream.write(FastBufferedOutputStream.java:164)
> at 
> java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
> at 
> java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
> at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:38)
> at 
> org.apache.spark.serializer.SerializationStream$class.writeAll(Serializer.scala:93)
> at 
> org.apache.spark.serializer.JavaSerializationStream.writeAll(JavaSerializer.scala:26)
> at 
> org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:913)
> at 
> org.apache.spark.storage.BlockManager.dataSerialize(BlockManager.scala:922)
> at 
> org.apache.spark.storage.MemoryStore.getBytes(MemoryStore.scala:102)
> at 
> org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:348)
> at 
> org.apache.spark.storage.BlockManager.getLocalBytes(BlockManager.scala:323)
> at 
> org.apache.spark.storage.BlockManagerWorker.getBlock(BlockManagerWorker.scala:90)
> at 
> org.apache.spark.storage.BlockManagerWorker.processBlockMessage(BlockManagerWorker.scala:69)
> at 
> org.apache.spark.storage.BlockManagerWorker$$anonfun$2.apply(BlockManagerWorker.scala:44)
> at 
> org.apache.spark.storage.BlockManagerWorker$$anonfun$2.apply(BlockManagerWorker.scala:44)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at 
> org.apache.spark.storage.BlockMessageArray.foreach(BlockMessageArray.scala:28)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at 
> org.apache.spark.storage.BlockMessageArray.map(BlockMessageArray.scala:28)
> at 
> org.apache.spark.storage.BlockManagerWorker.onBlockMessageReceive(BlockManagerWorker.scala:44)
> at 
> org.apache.spark.storage.BlockManagerWorker$$anonfun$1.apply(BlockManagerWorker.scala:34)
> at 
> org.apache.spark.storage.BlockManagerWorker$$anonfun$1.apply(BlockManagerWorker.scala:34)
> at 
> org.apache.spark.network.ConnectionManager.org$apache$spark$network$ConnectionManager$$handleMessage(ConnectionManager.scala:661)
> at 
> org.apache.spark.network.ConnectionManager$$anon$9.run(ConnectionManager.scala:503)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1437) Jenkins should build with Java 6

2014-04-07 Thread Sean Owen (JIRA)
Sean Owen created SPARK-1437:


 Summary: Jenkins should build with Java 6
 Key: SPARK-1437
 URL: https://issues.apache.org/jira/browse/SPARK-1437
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 0.9.0
Reporter: Sean Owen
Priority: Minor


Apologies if this was already on someone's to-do list, but I wanted to track 
this, as it bit two commits in the last few weeks.

Spark is intended to work with Java 6, and so compiles with source/target 1.6. 
Java 7 can correctly enforce Java 6 language rules and emit Java 6 bytecode. 
However, unless otherwise configured with -bootclasspath, javac will use its 
own (Java 7) library classes. This means code that uses classes in Java 7 will 
be allowed to compile, but the result will fail when run on Java 6.

This is why you get warnings like ...

Using /usr/java/jdk1.7.0_51 as default JAVA_HOME.
...
[warn] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6

The solution is just to tell Jenkins to use Java 6. This may be stating the 
obvious, but it should just be a setting under "Configure" for 
SparkPullRequestBuilder. In our Jenkinses, JDK 6/7/8 are set up; if it's not an 
option already I'm guessing it's not too hard to get Java 6 configured on the 
Amplab machines.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1427) HQL Examples Don't Work

2014-04-07 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-1427.
-

Resolution: Fixed

Fixed the toString issue here: https://github.com/apache/spark/pull/343

Could not recreate the permgen problem, but I did run the examples by hand 
successfully.

> HQL Examples Don't Work
> ---
>
> Key: SPARK-1427
> URL: https://issues.apache.org/jira/browse/SPARK-1427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.0
>Reporter: Patrick Wendell
>Assignee: Michael Armbrust
> Fix For: 1.0.0
>
>
> {code}
> scala> hql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
> 14/04/05 22:40:29 INFO ParseDriver: Parsing command: CREATE TABLE IF NOT 
> EXISTS src (key INT, value STRING)
> 14/04/05 22:40:30 INFO ParseDriver: Parse Completed
> 14/04/05 22:40:30 INFO Driver: 
> 14/04/05 22:40:30 INFO Driver: 
> 14/04/05 22:40:30 INFO Driver: 
> 14/04/05 22:40:30 INFO Driver: 
> 14/04/05 22:40:30 INFO ParseDriver: Parsing command: CREATE TABLE IF NOT 
> EXISTS src (key INT, value STRING)
> 14/04/05 22:40:30 INFO ParseDriver: Parse Completed
> 14/04/05 22:40:30 INFO Driver:  end=1396762830163 duration=1>
> 14/04/05 22:40:30 INFO Driver: 
> 14/04/05 22:40:30 INFO SemanticAnalyzer: Starting Semantic Analysis
> 14/04/05 22:40:30 INFO SemanticAnalyzer: Creating table src position=27
> 14/04/05 22:40:30 INFO HiveMetaStore: 0: Opening raw store with implemenation 
> class:org.apache.hadoop.hive.metastore.ObjectStore
> 14/04/05 22:40:30 INFO ObjectStore: ObjectStore, initialize called
> 14/04/05 22:40:30 INFO Persistence: Property datanucleus.cache.level2 unknown 
> - will be ignored
> 14/04/05 22:40:30 WARN BoneCPConfig: Max Connections < 1. Setting to 20
> 14/04/05 22:40:32 INFO ObjectStore: Setting MetaStore object pin classes with 
> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
> 14/04/05 22:40:32 INFO ObjectStore: Initialized ObjectStore
> 14/04/05 22:40:33 WARN BoneCPConfig: Max Connections < 1. Setting to 20
> 14/04/05 22:40:33 INFO HiveMetaStore: 0: get_table : db=default tbl=src
> 14/04/05 22:40:33 INFO audit: ugi=patrick ip=unknown-ip-addr  
> cmd=get_table : db=default tbl=src  
> 14/04/05 22:40:33 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as 
> "embedded-only" so does not have its own datastore table.
> 14/04/05 22:40:33 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" 
> so does not have its own datastore table.
> 14/04/05 22:40:34 INFO Driver: Semantic Analysis Completed
> 14/04/05 22:40:34 INFO Driver:  start=1396762830163 end=1396762834001 duration=3838>
> 14/04/05 22:40:34 INFO Driver: Returning Hive schema: 
> Schema(fieldSchemas:null, properties:null)
> 14/04/05 22:40:34 INFO Driver:  end=1396762834006 duration=3860>
> 14/04/05 22:40:34 INFO Driver: 
> 14/04/05 22:40:34 INFO Driver: Starting command: CREATE TABLE IF NOT EXISTS 
> src (key INT, value STRING)
> 14/04/05 22:40:34 INFO Driver:  start=1396762830146 end=1396762834016 duration=3870>
> 14/04/05 22:40:34 INFO Driver: 
> 14/04/05 22:40:34 INFO Driver:  end=1396762834016 duration=0>
> 14/04/05 22:40:34 INFO Driver:  start=1396762834006 end=1396762834017 duration=11>
> 14/04/05 22:40:34 INFO Driver: OK
> 14/04/05 22:40:34 INFO Driver: 
> 14/04/05 22:40:34 INFO Driver:  start=1396762834019 end=1396762834019 duration=0>
> 14/04/05 22:40:34 INFO Driver:  start=1396762830146 end=1396762834019 duration=3873>
> 14/04/05 22:40:34 INFO Driver: 
> 14/04/05 22:40:34 INFO Driver:  start=1396762834019 end=1396762834020 duration=1>
> java.lang.AssertionError: assertion failed: No plan for NativeCommand CREATE 
> TABLE IF NOT EXISTS src (key INT, value STRING)
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:218)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:218)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:219)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:219)
>   at 
> org.apache.spark.sql.SchemaRDDLike$class.toString(SchemaRDDLike.scala:44)
>   at org.apache.spark.sql.SchemaRDD.toString(SchemaRDD.scala:93)
>   at java.lang.String.valueOf(String.java:2854)
>   at scala.runtime.ScalaRunTime$.stringOf(ScalaRunTime.scala:331)
>   at scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:337)
>   at .(:10)
>   at .()
>   at $print()
>  

[jira] [Updated] (SPARK-1099) Allow inferring number of cores with local[*]

2014-04-07 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1099:
---

Summary: Allow inferring number of cores with local[*]  (was: Spark's local 
mode should respect spark.cores.max by default)

> Allow inferring number of cores with local[*]
> -
>
> Key: SPARK-1099
> URL: https://issues.apache.org/jira/browse/SPARK-1099
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Reporter: Aaron Davidson
>Assignee: Aaron Davidson
>Priority: Minor
> Fix For: 1.0.0
>
>
> It seems reasonable that the default number of cores used by spark's local 
> mode (when no value is specified) is drawn from the spark.cores.max 
> configuration parameter (which, conveniently, is now settable as a 
> command-line option in spark-shell).
> For the sake of consistency, it's probable that this change would also entail 
> making the default number of cores when spark.cores.max is NOT specified to 
> be as many logical cores are on the machine (which is what standalone mode 
> does). This too seems reasonable, as Spark is inherently a distributed system 
> and I think it's expected that it should use multiple cores by default. 
> However, it is a behavioral change, and thus requires caution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1099) Spark's local mode should respect spark.cores.max by default

2014-04-07 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1099:
---

Summary: Spark's local mode should respect spark.cores.max by default  
(was: Spark's local mode should probably respect spark.cores.max by default)

> Spark's local mode should respect spark.cores.max by default
> 
>
> Key: SPARK-1099
> URL: https://issues.apache.org/jira/browse/SPARK-1099
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Reporter: Aaron Davidson
>Assignee: Aaron Davidson
>Priority: Minor
> Fix For: 1.0.0
>
>
> It seems reasonable that the default number of cores used by spark's local 
> mode (when no value is specified) is drawn from the spark.cores.max 
> configuration parameter (which, conveniently, is now settable as a 
> command-line option in spark-shell).
> For the sake of consistency, it's probable that this change would also entail 
> making the default number of cores when spark.cores.max is NOT specified to 
> be as many logical cores are on the machine (which is what standalone mode 
> does). This too seems reasonable, as Spark is inherently a distributed system 
> and I think it's expected that it should use multiple cores by default. 
> However, it is a behavioral change, and thus requires caution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1035) Use a single mechanism for distributing jars on Yarn

2014-04-07 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza resolved SPARK-1035.
---

Resolution: Won't Fix

When I originally filed this, I didn't realize that jars could be added at 
runtime.  In light of this, I don't think we can do much better than the 
current state of things.

> Use a single mechanism for distributing jars on Yarn
> 
>
> Key: SPARK-1035
> URL: https://issues.apache.org/jira/browse/SPARK-1035
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 0.9.0
>Reporter: Sandy Pérez González
>
> When running Spark on Yarn, the app jar is distributed through a different 
> mechanism than additional added jars. The app jar gets to every worker node 
> as a Yarn local resource. Additional jars only get to the app master, and the 
> app master serves them to workers with the HTTP file server.   Strangeness 
> comes when an application addJar's the app jar, which is a natural thing to 
> do in mesos or standalone mode, but in Yarn mode, will try to distribute the 
> same jar through a different mechanism.  Using the same mechanism for both 
> would eliminate this issue, as well as greatly simplify debugging 
> ClassNotFoundExceptions in workers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1059) Now that we submit core requests to YARN, fix usage message in ClientArguments

2014-04-07 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza resolved SPARK-1059.
---

Resolution: Duplicate

This got fixed in Tom's security patch.

> Now that we submit core requests to YARN, fix usage message in ClientArguments
> --
>
> Key: SPARK-1059
> URL: https://issues.apache.org/jira/browse/SPARK-1059
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Reporter: Sandy Pérez González
>Priority: Minor
>
> "Number of cores for the workers (Default: 1). This is unsused right now."



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1209) SparkHadoopUtil should not use package org.apache.hadoop

2014-04-07 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza resolved SPARK-1209.
---

Resolution: Fixed

> SparkHadoopUtil should not use package org.apache.hadoop
> 
>
> Key: SPARK-1209
> URL: https://issues.apache.org/jira/browse/SPARK-1209
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Sandy Pérez González
>Assignee: Mark Grover
>
> It's private, so the change won't break compatibility



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (SPARK-1101) Umbrella for hardening Spark on YARN

2014-04-07 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza reassigned SPARK-1101:
-

Assignee: Sandy Ryza  (was: Sandy Pérez González)

> Umbrella for hardening Spark on YARN
> 
>
> Key: SPARK-1101
> URL: https://issues.apache.org/jira/browse/SPARK-1101
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 0.9.0
>Reporter: Sandy Pérez González
>Assignee: Sandy Ryza
>
> This is an umbrella JIRA to track near-term improvements for Spark on YARN.  
> I don't think huge changes are required - just fixing some bugs, plugging 
> usability gaps, and enhancing documentation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1409) Flaky Test: "actor input stream" test in org.apache.spark.streaming.InputStreamsSuite

2014-04-07 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962171#comment-13962171
 ] 

Patrick Wendell commented on SPARK-1409:


I've disabled this test for now.

> Flaky Test: "actor input stream" test in 
> org.apache.spark.streaming.InputStreamsSuite
> -
>
> Key: SPARK-1409
> URL: https://issues.apache.org/jira/browse/SPARK-1409
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Michael Armbrust
>Assignee: Tathagata Das
>
> Here are just a few cases:
> https://travis-ci.org/apache/spark/jobs/22151827
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13709/



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1436) Compression code broke in-memory store

2014-04-07 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962155#comment-13962155
 ] 

Reynold Xin commented on SPARK-1436:


Try run the following code:

{code}

package org.apache.spark.sql

import org.apache.spark.sql.test.TestSQLContext._
import org.apache.spark.sql.catalyst.util._

case class Data(a: Int, b: Long)

object AggregationBenchmark {
  def main(args: Array[String]): Unit = {
val rdd =
  sparkContext.parallelize(1 to 20).flatMap(_ => (1 to 50).map(i => 
Data(i % 100, i)))
rdd.registerAsTable("data")
cacheTable("data")

(1 to 10).foreach { i =>
  println(s"=== ITERATION $i ===")

  benchmark { println("SELECT COUNT() FROM data:" + sql("SELECT COUNT(*) 
FROM data").collect().head) }

  println("SELECT a, SUM(b) FROM data GROUP BY a")
  benchmark { sql("SELECT a, SUM(b) FROM data GROUP BY a").count() }

  println("SELECT SUM(b) FROM data")
  benchmark { sql("SELECT SUM(b) FROM data").count() }
}
  }
}
{code}

The following exception is thrown:
{code}
java.nio.BufferUnderflowException
at java.nio.Buffer.nextGetIndex(Buffer.java:498)
at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:355)
at 
org.apache.spark.sql.columnar.ColumnAccessor$.apply(ColumnAccessor.scala:103)
at 
org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1$$anon$1$$anonfun$3.apply(InMemoryColumnarTableScan.scala:61)
at 
org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1$$anon$1$$anonfun$3.apply(InMemoryColumnarTableScan.scala:61)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at 
org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1$$anon$1.(InMemoryColumnarTableScan.scala:61)
at 
org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1.apply(InMemoryColumnarTableScan.scala:60)
at 
org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1.apply(InMemoryColumnarTableScan.scala:56)
at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:504)
at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:504)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:229)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:220)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:229)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:220)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:229)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:220)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
at org.apache.spark.scheduler.Task.run(Task.scala:52)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:46)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/04/07 12:07:38 WARN TaskSetManager: Lost TID 3 (task 4.0:0)
14/04/07 12:07:38 WARN TaskSetManager: Loss was due to 
java.nio.BufferUnderflowException
java.nio.BufferUnderflowException
at java.nio.Buffer.nextGetIndex(Buffer.java:498)
at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:355)
at 
org.apache.spark.sql.columnar.ColumnAccessor$.apply(ColumnAccessor.scala:103)
at 
org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1$$anon$1$$anonfun$3.apply(InMemoryColumnarTableScan.scala:61)
at 
org.apache.spark.sql.columnar.InMemoryColumnarTableScan$$anonfun$execute$1$$anon$1$$anonfun$3.apply(InMemoryColumnarTableScan.scala:61)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(

[jira] [Created] (SPARK-1436) Compression code broke in-memory store

2014-04-07 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-1436:
--

 Summary: Compression code broke in-memory store
 Key: SPARK-1436
 URL: https://issues.apache.org/jira/browse/SPARK-1436
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Cheng Lian
Priority: Blocker
 Fix For: 1.0.0


See my following comment...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib

2014-04-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962133#comment-13962133
 ] 

Sean Owen commented on SPARK-1406:
--

PMML is the de facto serialization, so certainly the one to consider 
leveraging. It's just a serialization, so it's not by itself going to help with 
feature transformation.

Given data and PMML, it's fairly easy to use things like JPMML to do 
evaluation. You could write some thin wrapper code in MLlib to facilitate that, 
but it may not give a lot of marginal benefit.

Import/export is a bit different. Again JPMML will do all the mechanisms of 
serializing an object model, so that need not be written.

I think export is more important than import, mostly because I think of MLlib 
as a model builder, and therefore a producer rather than consumer of models. 
Export is also easier since you just need to write the glue code to translate 
some MLlib object into a JPMML representation, and only need to worry about 
dealing with the subset of PMML that covers whatever the MLlib output describes.

Import is harder for the same reason -- you're not going to want to or be able 
to support everything PMML can describe, so it's already a question of trying 
to map the vocab as best you can to whatever MLlib supports. It's also less 
important, IMHO, since MLlib's value is more in making the model than doing 
something with it right now.

I would suggest the import/export stuff be kept close, but separate, to the 
other MLlib code. Not a different module, just cleanly separated from the 
abstract representation.

I think there's a whole project's worth of stuff one could do around consuming, 
managing, serving models!

So to summarize: I'd suggest scoping this to start as "wire up all *Model files 
to JPMML equivalents, as an 'export' package" or something.

> PMML model evaluation support via MLib
> --
>
> Key: SPARK-1406
> URL: https://issues.apache.org/jira/browse/SPARK-1406
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Thomas Darimont
>
> It would be useful if spark would provide support the evaluation of PMML 
> models (http://www.dmg.org/v4-2/GeneralStructure.html).
> This would allow to use analytical models that were created with a 
> statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which 
> would perform the actual model evaluation for a given input tuple. The PMML 
> model would then just contain the "parameterization" of an analytical model.
> Other projects like JPMML-Evaluator do a similar thing.
> https://github.com/jpmml/jpmml/tree/master/pmml-evaluator



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1390) Refactor RDD backed matrices

2014-04-07 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-1390:
-

Fix Version/s: 1.0.0

> Refactor RDD backed matrices
> 
>
> Key: SPARK-1390
> URL: https://issues.apache.org/jira/browse/SPARK-1390
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Blocker
> Fix For: 1.0.0
>
>
> The current interfaces of RDD backed matrices needs refactoring for v1.0 
> release. It would be better if we have a clear separation of local matrices 
> and those backed by RDD. Right now, we have 
> 1. org.apache.spark.mllib.linalg.SparseMatrix, which is a wrapper over an RDD 
> of matrix entries, i.e., coordinate list format.
> 2. org.apache.spark.mllib.linalg.TallSkinnyDenseMatrix, which is a wrapper 
> over RDD[Array[Double]], i.e. row-oriented format.
> We will see naming collision when we introduce local SparseMatrix and the 
> name TallSkinnyDenseMatrix is not exact if we switch to RDD[Vector] instead 
> of RDD[Array[Double]]. It would be better to have "RDD" in the type name to 
> suggest that operations will trigger a job.
> The proposed names (all under org.apache.spark.mllib.linalg.rdd):
> 1. RDDMatrix: trait for matrices backed by one or more RDDs
> 2. CoordinateRDDMatrix: wrapper of RDD[RDDMatrixEntry]
> 3. RowRDDMatrix: wrapper of RDD[Vector] whose rows do not have special 
> ordering
> 4. IndexedRowRDDMatrix: wrapper of RDD[(Long, Vector)] whose rows are 
> associated with indices
> The proposal is subject to charge, but it would be nice to make the changes 
> before v1.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1252) On YARN, use container-log4j.properties for executors

2014-04-07 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-1252.
--

Resolution: Fixed

> On YARN, use container-log4j.properties for executors
> -
>
> Key: SPARK-1252
> URL: https://issues.apache.org/jira/browse/SPARK-1252
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 0.9.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>Priority: Critical
> Fix For: 1.0.0
>
>
> YARN provides a log4j.properties file that's distinct from the NodeManager 
> log4j.properties.  Containers are supposed to use this so that they don't try 
> to write to the NodeManager log file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1252) On YARN, use container-log4j.properties for executors

2014-04-07 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962109#comment-13962109
 ] 

Thomas Graves commented on SPARK-1252:
--

https://github.com/apache/spark/pull/148

> On YARN, use container-log4j.properties for executors
> -
>
> Key: SPARK-1252
> URL: https://issues.apache.org/jira/browse/SPARK-1252
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 0.9.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>Priority: Critical
> Fix For: 1.0.0
>
>
> YARN provides a log4j.properties file that's distinct from the NodeManager 
> log4j.properties.  Containers are supposed to use this so that they don't try 
> to write to the NodeManager log file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1214) 0-1 labels

2014-04-07 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-1214.
--

   Resolution: Fixed
Fix Version/s: 0.9.0
 Assignee: Xiangrui Meng  (was: Shashidhar E S)

Fixed in 0.9.0 or an earlier version.

> 0-1 labels 
> ---
>
> Key: SPARK-1214
> URL: https://issues.apache.org/jira/browse/SPARK-1214
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Reporter: Ameet Talwalkar
>Assignee: Xiangrui Meng
> Fix For: 0.9.0
>
>
> Use \{0,1\} labels for binary classification instead of {-1,1}. Advantages 
> include:
> (+) Consistency across algorithms
> (+) Naturally extends to multi-class classification



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1217) Add proximal gradient updater.

2014-04-07 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-1217.
--

   Resolution: Fixed
Fix Version/s: 0.9.0

> Add proximal gradient updater.
> --
>
> Key: SPARK-1217
> URL: https://issues.apache.org/jira/browse/SPARK-1217
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Reporter: Ameet Talwalkar
> Fix For: 0.9.0
>
>
> Add proximal gradient updater, in particular for L1 regularization.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1435) Don't assume context class loader is set when creating classes via reflection

2014-04-07 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-1435:
--

 Summary: Don't assume context class loader is set when creating 
classes via reflection
 Key: SPARK-1435
 URL: https://issues.apache.org/jira/browse/SPARK-1435
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1435) Don't assume context class loader is set when creating classes via reflection

2014-04-07 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962061#comment-13962061
 ] 

Patrick Wendell commented on SPARK-1435:


SPARK-1403 provides a work around in the case of mesos, but in general we 
should just avoid making this assumption.

> Don't assume context class loader is set when creating classes via reflection
> -
>
> Key: SPARK-1435
> URL: https://issues.apache.org/jira/browse/SPARK-1435
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Patrick Wendell
>Assignee: Patrick Wendell
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1222) Logistic Regression (+ regularized variants)

2014-04-07 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-1222.
--

   Resolution: Fixed
Fix Version/s: 0.9.0

Implemented in 0.9.0 or an earlier version.

> Logistic Regression (+ regularized variants)
> 
>
> Key: SPARK-1222
> URL: https://issues.apache.org/jira/browse/SPARK-1222
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Ameet Talwalkar
>Assignee: Shivaram Venkataraman
> Fix For: 0.9.0
>
>
> Implement Logistic Regression using the SGD optimization primitives.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1223) Linear Regression (+ regularized variants)

2014-04-07 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-1223.
--

   Resolution: Fixed
Fix Version/s: 0.9.0

Implemented in 0.9.0 or an earlier version.

> Linear Regression (+ regularized variants)
> --
>
> Key: SPARK-1223
> URL: https://issues.apache.org/jira/browse/SPARK-1223
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Ameet Talwalkar
>Assignee: Shivaram Venkataraman
> Fix For: 0.9.0
>
>
> Implement Linear regression using the SGD optimization primitives.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1219) Minibatch SGD with disjoint partitions

2014-04-07 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-1219:
-

Fix Version/s: 0.9.0

> Minibatch SGD with disjoint partitions
> --
>
> Key: SPARK-1219
> URL: https://issues.apache.org/jira/browse/SPARK-1219
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Ameet Talwalkar
> Fix For: 0.9.0
>
>
> Takes a gradient function as input.  At each iteration, we run stochastic 
> gradient descent locally on each worker with a fraction (alpha) of the data 
> points selected randomly and disjointly (i.e., we ensure that we touch all 
> datapoints after at most 1/alpha iterations).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1218) Minibatch SGD with random sampling

2014-04-07 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-1218.
--

   Resolution: Fixed
Fix Version/s: 0.9.0

Fixed in 0.9.0 or an earlier version.

> Minibatch SGD with random sampling
> --
>
> Key: SPARK-1218
> URL: https://issues.apache.org/jira/browse/SPARK-1218
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Ameet Talwalkar
>Assignee: Shivaram Venkataraman
> Fix For: 0.9.0
>
>
> Takes a gradient function as input.  At each iteration, we run stochastic 
> gradient descent locally on each worker with a fraction of the data points 
> selected randomly and with replacement (i.e., sampled points may overlap 
> across iterations).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1221) SVMs (+ regularized variants)

2014-04-07 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-1221.
--

   Resolution: Fixed
Fix Version/s: 0.9.0

Implemented in 0.9.0 or an earlier version.

> SVMs (+ regularized variants)
> -
>
> Key: SPARK-1221
> URL: https://issues.apache.org/jira/browse/SPARK-1221
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Ameet Talwalkar
>Assignee: Shivaram Venkataraman
> Fix For: 0.9.0
>
>
> Implement Support Vector Machines using the SGD optimization primitives.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1219) Minibatch SGD with disjoint partitions

2014-04-07 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-1219.
--

Resolution: Fixed

Implemented in 0.9.0 or an earlier version.

> Minibatch SGD with disjoint partitions
> --
>
> Key: SPARK-1219
> URL: https://issues.apache.org/jira/browse/SPARK-1219
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Ameet Talwalkar
>
> Takes a gradient function as input.  At each iteration, we run stochastic 
> gradient descent locally on each worker with a fraction (alpha) of the data 
> points selected randomly and disjointly (i.e., we ensure that we touch all 
> datapoints after at most 1/alpha iterations).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1406) PMML model evaluation support via MLib

2014-04-07 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962048#comment-13962048
 ] 

Xiangrui Meng commented on SPARK-1406:
--

I think we should support PMML import/export in MLlib. PMML also provides 
feature transformations, which MLlib has very limited support at this time. The 
question is 1) how we take leverage on existing PMML packages, 2)  how many 
people volunteer.

Sean, it would be super helpful if you can share some experience on Oryx's PMML 
support, since I'm also not sure about whether this is the right time to start.

> PMML model evaluation support via MLib
> --
>
> Key: SPARK-1406
> URL: https://issues.apache.org/jira/browse/SPARK-1406
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Thomas Darimont
>
> It would be useful if spark would provide support the evaluation of PMML 
> models (http://www.dmg.org/v4-2/GeneralStructure.html).
> This would allow to use analytical models that were created with a 
> statistical modeling tool like R, SAS, SPSS, etc. with Spark (MLib) which 
> would perform the actual model evaluation for a given input tuple. The PMML 
> model would then just contain the "parameterization" of an analytical model.
> Other projects like JPMML-Evaluator do a similar thing.
> https://github.com/jpmml/jpmml/tree/master/pmml-evaluator



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1403) Spark on Mesos does not set Thread's context class loader

2014-04-07 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962032#comment-13962032
 ] 

Patrick Wendell commented on SPARK-1403:


The underlying issue here is that we've made assumptions in various parts of 
the codebase that the context classloader is set on a thread. In general, we 
should relax these assumptions and just fallback to the classloader that loaded 
Spark. As a workaround this patch:

https://github.com/apache/spark/pull/322/files

just manually sets the classloader to the system class loader.

> Spark on Mesos does not set Thread's context class loader
> -
>
> Key: SPARK-1403
> URL: https://issues.apache.org/jira/browse/SPARK-1403
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
> Environment: ubuntu 12.04 on vagrant
>Reporter: Bharath Bhushan
>Priority: Blocker
>
> I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark 
> executor on mesos slave throws a  java.lang.ClassNotFoundException for 
> org.apache.spark.serializer.JavaSerializer.
> The lengthy discussion is here: 
> http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1403) Mesos on Spark does not set Thread's context class loader

2014-04-07 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1403:
---

Summary: Mesos on Spark does not set Thread's context class loader  (was:  
java.lang.ClassNotFoundException - spark on mesos)

> Mesos on Spark does not set Thread's context class loader
> -
>
> Key: SPARK-1403
> URL: https://issues.apache.org/jira/browse/SPARK-1403
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
> Environment: ubuntu 12.04 on vagrant
>Reporter: Bharath Bhushan
>Priority: Blocker
>
> I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark 
> executor on mesos slave throws a  java.lang.ClassNotFoundException for 
> org.apache.spark.serializer.JavaSerializer.
> The lengthy discussion is here: 
> http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1403) Spark on Mesos does not set Thread's context class loader

2014-04-07 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1403:
---

Summary: Spark on Mesos does not set Thread's context class loader  (was: 
Mesos on Spark does not set Thread's context class loader)

> Spark on Mesos does not set Thread's context class loader
> -
>
> Key: SPARK-1403
> URL: https://issues.apache.org/jira/browse/SPARK-1403
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
> Environment: ubuntu 12.04 on vagrant
>Reporter: Bharath Bhushan
>Priority: Blocker
>
> I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark 
> executor on mesos slave throws a  java.lang.ClassNotFoundException for 
> org.apache.spark.serializer.JavaSerializer.
> The lengthy discussion is here: 
> http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1021) sortByKey() launches a cluster job when it shouldn't

2014-04-07 Thread Matei Zaharia (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962026#comment-13962026
 ] 

Matei Zaharia commented on SPARK-1021:
--

Note that if we do this, we'll need a similar fix in Python, which may be 
trickier.

> sortByKey() launches a cluster job when it shouldn't
> 
>
> Key: SPARK-1021
> URL: https://issues.apache.org/jira/browse/SPARK-1021
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Andrew Ash
>  Labels: starter
>
> The sortByKey() method is listed as a transformation, not an action, in the 
> documentation.  But it launches a cluster job regardless.
> http://spark.incubator.apache.org/docs/latest/scala-programming-guide.html
> Some discussion on the mailing list suggested that this is a problem with the 
> rdd.count() call inside Partitioner.scala's rangeBounds method.
> https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/Partitioner.scala#L102
> Josh Rosen suggests that rangeBounds should be made into a lazy variable:
> {quote}
> I wonder whether making RangePartitoner .rangeBounds into a lazy val would 
> fix this 
> (https://github.com/apache/incubator-spark/blob/6169fe14a140146602fb07cfcd13eee6efad98f9/core/src/main/scala/org/apache/spark/Partitioner.scala#L95).
>   We'd need to make sure that rangeBounds() is never called before an action 
> is performed.  This could be tricky because it's called in the 
> RangePartitioner.equals() method.  Maybe it's sufficient to just compare the 
> number of partitions, the ids of the RDDs used to create the 
> RangePartitioner, and the sort ordering.  This still supports the case where 
> I range-partition one RDD and pass the same partitioner to a different RDD.  
> It breaks support for the case where two range partitioners created on 
> different RDDs happened to have the same rangeBounds(), but it seems unlikely 
> that this would really harm performance since it's probably unlikely that the 
> range partitioners are equal by chance.
> {quote}
> Can we please make this happen?  I'll send a PR on GitHub to start the 
> discussion and testing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1432) Potential memory leak in stageIdToExecutorSummaries in JobProgressTracker

2014-04-07 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1432:
---

Assignee: Davis Shepherd

> Potential memory leak in stageIdToExecutorSummaries in JobProgressTracker
> -
>
> Key: SPARK-1432
> URL: https://issues.apache.org/jira/browse/SPARK-1432
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 0.9.0
>Reporter: Davis Shepherd
>Assignee: Davis Shepherd
> Fix For: 1.0.0, 0.9.2
>
>
> JobProgressTracker continuously cleans up old metadata as per the 
> spark.ui.retainedStages configuration parameter. It seems however that not 
> all metadata maps are being cleaned, in particular stageIdToExecutorSummaries 
> could grow in an unbounded manner in a long running application.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1432) Potential memory leak in stageIdToExecutorSummaries in JobProgressTracker

2014-04-07 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962022#comment-13962022
 ] 

Patrick Wendell commented on SPARK-1432:


https://github.com/apache/spark/pull/338

> Potential memory leak in stageIdToExecutorSummaries in JobProgressTracker
> -
>
> Key: SPARK-1432
> URL: https://issues.apache.org/jira/browse/SPARK-1432
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 0.9.0
>Reporter: Davis Shepherd
>Assignee: Davis Shepherd
> Fix For: 1.0.0, 0.9.2
>
>
> JobProgressTracker continuously cleans up old metadata as per the 
> spark.ui.retainedStages configuration parameter. It seems however that not 
> all metadata maps are being cleaned, in particular stageIdToExecutorSummaries 
> could grow in an unbounded manner in a long running application.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1432) Potential memory leak in stageIdToExecutorSummaries in JobProgressTracker

2014-04-07 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1432.


   Resolution: Fixed
Fix Version/s: 0.9.2
   1.0.0

> Potential memory leak in stageIdToExecutorSummaries in JobProgressTracker
> -
>
> Key: SPARK-1432
> URL: https://issues.apache.org/jira/browse/SPARK-1432
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 0.9.0
>Reporter: Davis Shepherd
>Assignee: Davis Shepherd
> Fix For: 1.0.0, 0.9.2
>
>
> JobProgressTracker continuously cleans up old metadata as per the 
> spark.ui.retainedStages configuration parameter. It seems however that not 
> all metadata maps are being cleaned, in particular stageIdToExecutorSummaries 
> could grow in an unbounded manner in a long running application.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1434) Make labelParser Java friendly.

2014-04-07 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-1434:


 Summary: Make labelParser Java friendly.
 Key: SPARK-1434
 URL: https://issues.apache.org/jira/browse/SPARK-1434
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.0.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Minor
 Fix For: 1.0.0


MLUtils#loadLibSVMData uses an anonymous function for the label parser. Java 
users won't like it. So I make a trait for LabelParser and provide two 
implementations: binary and multiclass.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1434) Make labelParser Java friendly.

2014-04-07 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-1434:
-

Component/s: MLlib

> Make labelParser Java friendly.
> ---
>
> Key: SPARK-1434
> URL: https://issues.apache.org/jira/browse/SPARK-1434
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.0.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Minor
> Fix For: 1.0.0
>
>
> MLUtils#loadLibSVMData uses an anonymous function for the label parser. Java 
> users won't like it. So I make a trait for LabelParser and provide two 
> implementations: binary and multiclass.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1433) Upgrade Mesos dependency to 0.17.0

2014-04-07 Thread Sandeep Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962011#comment-13962011
 ] 

Sandeep Singh commented on SPARK-1433:
--

Sorry a typo

> Upgrade Mesos dependency to 0.17.0
> --
>
> Key: SPARK-1433
> URL: https://issues.apache.org/jira/browse/SPARK-1433
> Project: Spark
>  Issue Type: Task
>Reporter: Sandeep Singh
>Priority: Minor
>
> Mesos 0.14.0 was released 6 months ago.
> Upgrade Mesos dependency to 0.17.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1433) Upgrade Mesos dependency to 0.17.0

2014-04-07 Thread Sandeep Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-1433:
-

Description: 
Mesos 0.14.0 was released 6 months ago.
Upgrade Mesos dependency to 0.17.0

  was:
HBase 0.14.0 was released 6 months ago.
Upgrade Mesos dependency to 0.17.0


> Upgrade Mesos dependency to 0.17.0
> --
>
> Key: SPARK-1433
> URL: https://issues.apache.org/jira/browse/SPARK-1433
> Project: Spark
>  Issue Type: Task
>Reporter: Sandeep Singh
>Priority: Minor
>
> Mesos 0.14.0 was released 6 months ago.
> Upgrade Mesos dependency to 0.17.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1433) Upgrade Mesos dependency to 0.17.0

2014-04-07 Thread Sandeep Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-1433:
-

Description: 
HBase 0.14.0 was released 6 months ago.
Upgrade Mesos dependency to 0.17.0

  was:
HBase 0.14.0 was released 6 months ago.
Upgrade HBase dependency to 0.17.0


> Upgrade Mesos dependency to 0.17.0
> --
>
> Key: SPARK-1433
> URL: https://issues.apache.org/jira/browse/SPARK-1433
> Project: Spark
>  Issue Type: Task
>Reporter: Sandeep Singh
>Priority: Minor
>
> HBase 0.14.0 was released 6 months ago.
> Upgrade Mesos dependency to 0.17.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1223) Linear Regression (+ regularized variants)

2014-04-07 Thread Martin Jaggi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962005#comment-13962005
 ] 

Martin Jaggi commented on SPARK-1223:
-

is resolved, right?

> Linear Regression (+ regularized variants)
> --
>
> Key: SPARK-1223
> URL: https://issues.apache.org/jira/browse/SPARK-1223
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Ameet Talwalkar
>Assignee: Shivaram Venkataraman
>
> Implement Linear regression using the SGD optimization primitives.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1433) Upgrade Mesos dependency to 0.17.0

2014-04-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962006#comment-13962006
 ] 

Sean Owen commented on SPARK-1433:
--

You mean Mesos?

> Upgrade Mesos dependency to 0.17.0
> --
>
> Key: SPARK-1433
> URL: https://issues.apache.org/jira/browse/SPARK-1433
> Project: Spark
>  Issue Type: Task
>Reporter: Sandeep Singh
>Priority: Minor
>
> HBase 0.14.0 was released 6 months ago.
> Upgrade HBase dependency to 0.17.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1221) SVMs (+ regularized variants)

2014-04-07 Thread Martin Jaggi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962008#comment-13962008
 ] 

Martin Jaggi commented on SPARK-1221:
-

is resolved, right?

> SVMs (+ regularized variants)
> -
>
> Key: SPARK-1221
> URL: https://issues.apache.org/jira/browse/SPARK-1221
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Ameet Talwalkar
>Assignee: Shivaram Venkataraman
>
> Implement Support Vector Machines using the SGD optimization primitives.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1222) Logistic Regression (+ regularized variants)

2014-04-07 Thread Martin Jaggi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962007#comment-13962007
 ] 

Martin Jaggi commented on SPARK-1222:
-

is resolved, right?

> Logistic Regression (+ regularized variants)
> 
>
> Key: SPARK-1222
> URL: https://issues.apache.org/jira/browse/SPARK-1222
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Ameet Talwalkar
>Assignee: Shivaram Venkataraman
>
> Implement Logistic Regression using the SGD optimization primitives.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1217) Add proximal gradient updater.

2014-04-07 Thread Martin Jaggi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962004#comment-13962004
 ] 

Martin Jaggi commented on SPARK-1217:
-

The L1 updater is already proximal, as in the current code. Since it has no 
effect for L2, we could mark the issue as resolved for now.

> Add proximal gradient updater.
> --
>
> Key: SPARK-1217
> URL: https://issues.apache.org/jira/browse/SPARK-1217
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Reporter: Ameet Talwalkar
>
> Add proximal gradient updater, in particular for L1 regularization.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1433) Upgrade Mesos dependency to 0.17.0

2014-04-07 Thread Sandeep Singh (JIRA)
Sandeep Singh created SPARK-1433:


 Summary: Upgrade Mesos dependency to 0.17.0
 Key: SPARK-1433
 URL: https://issues.apache.org/jira/browse/SPARK-1433
 Project: Spark
  Issue Type: Task
Reporter: Sandeep Singh
Priority: Minor


HBase 0.14.0 was released 6 months ago.
Upgrade HBase dependency to 0.17.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Issue Comment Deleted] (SPARK-1217) Add proximal gradient updater.

2014-04-07 Thread M J (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

M J updated SPARK-1217:
---

Comment: was deleted

(was: The L1 updater is already proximal, as in the current code. Since it has 
no effect for L2, we could mark the issue as resolved for now.)

> Add proximal gradient updater.
> --
>
> Key: SPARK-1217
> URL: https://issues.apache.org/jira/browse/SPARK-1217
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Reporter: Ameet Talwalkar
>
> Add proximal gradient updater, in particular for L1 regularization.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1217) Add proximal gradient updater.

2014-04-07 Thread M J (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961993#comment-13961993
 ] 

M J commented on SPARK-1217:


The L1 updater is already proximal, as in the current code. Since it has no 
effect for L2, we could mark the issue as resolved for now.

> Add proximal gradient updater.
> --
>
> Key: SPARK-1217
> URL: https://issues.apache.org/jira/browse/SPARK-1217
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Reporter: Ameet Talwalkar
>
> Add proximal gradient updater, in particular for L1 regularization.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-1432) Potential memory leak in stageIdToExecutorSummaries in JobProgressTracker

2014-04-07 Thread Davis Shepherd (JIRA)
Davis Shepherd created SPARK-1432:
-

 Summary: Potential memory leak in stageIdToExecutorSummaries in 
JobProgressTracker
 Key: SPARK-1432
 URL: https://issues.apache.org/jira/browse/SPARK-1432
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 0.9.0
Reporter: Davis Shepherd


JobProgressTracker continuously cleans up old metadata as per the 
spark.ui.retainedStages configuration parameter. It seems however that not all 
metadata maps are being cleaned, in particular stageIdToExecutorSummaries could 
grow in an unbounded manner in a long running application.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (SPARK-1420) The maven build error for Spark Catalyst

2014-04-07 Thread witgo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

witgo closed SPARK-1420.


Resolution: Fixed

> The maven build error for Spark Catalyst
> 
>
> Key: SPARK-1420
> URL: https://issues.apache.org/jira/browse/SPARK-1420
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: witgo
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Issue Comment Deleted] (SPARK-1422) Add scripts for launching Spark on Google Compute Engine

2014-04-07 Thread Sandeep Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-1422:
-

Comment: was deleted

(was: It will be similar to ec2 script ?)

> Add scripts for launching Spark on Google Compute Engine
> 
>
> Key: SPARK-1422
> URL: https://issues.apache.org/jira/browse/SPARK-1422
> Project: Spark
>  Issue Type: Improvement
>  Components: EC2
>Reporter: Matei Zaharia
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-1420) The maven build error for Spark Catalyst

2014-04-07 Thread witgo (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

witgo updated SPARK-1420:
-

Fix Version/s: 1.0.0

> The maven build error for Spark Catalyst
> 
>
> Key: SPARK-1420
> URL: https://issues.apache.org/jira/browse/SPARK-1420
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: witgo
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1422) Add scripts for launching Spark on Google Compute Engine

2014-04-07 Thread Sandeep Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961922#comment-13961922
 ] 

Sandeep Singh commented on SPARK-1422:
--

It will be similar to ec2 script ?

> Add scripts for launching Spark on Google Compute Engine
> 
>
> Key: SPARK-1422
> URL: https://issues.apache.org/jira/browse/SPARK-1422
> Project: Spark
>  Issue Type: Improvement
>  Components: EC2
>Reporter: Matei Zaharia
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1417) Spark on Yarn - spark UI link from resourcemanager is broken

2014-04-07 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961858#comment-13961858
 ] 

Thomas Graves commented on SPARK-1417:
--

https://github.com/apache/spark/pull/344

> Spark on Yarn - spark UI link from resourcemanager is broken
> 
>
> Key: SPARK-1417
> URL: https://issues.apache.org/jira/browse/SPARK-1417
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.0.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Blocker
>
> When running spark on yarn in yarn-cluster mode, spark registers a url with 
> the Yarn ResourceManager to point to the spark UI.  This link is now broken. 
> The link should be something like: < resourcemanager >/proxy/< applicationId >
> instead its coming back as < resourcemanager >/< host of am:port >



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (SPARK-1371) HashAggregate should stream tuples and avoid doing an extra count

2014-04-07 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-1371.
-

Resolution: Fixed

> HashAggregate should stream tuples and avoid doing an extra count
> -
>
> Key: SPARK-1371
> URL: https://issues.apache.org/jira/browse/SPARK-1371
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Blocker
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)