date:20140803

[jira] [Resolved] (SPARK-2510) word2vec: Distributed Representation of Words

2014-08-03 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-2510.
--

   Resolution: Fixed
Fix Version/s: 1.1.0

Issue resolved by pull request 1719
[https://github.com/apache/spark/pull/1719]

> word2vec: Distributed Representation of Words
> -
>
> Key: SPARK-2510
> URL: https://issues.apache.org/jira/browse/SPARK-2510
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Liquan Pei
>Assignee: Liquan Pei
> Fix For: 1.1.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> We would like to add parallel implementation of word2vec to MLlib. word2vec 
> finds distributed representation of words through training of large data 
> sets. We will focus on skip-gram model and hierarchical softmax in our 
> initial implementation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2823) GraphX jobs throw IllegalArgumentException

2014-08-03 Thread Lu Lu (JIRA)

Lu Lu created SPARK-2823:


 Summary: GraphX jobs throw IllegalArgumentException
 Key: SPARK-2823
 URL: https://issues.apache.org/jira/browse/SPARK-2823
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Reporter: Lu Lu


If the users set “spark.default.parallelism” and the value is different with 
the EdgeRDD partition number, GraphX jobs will throw IllegalArgumentException:

14/07/26 21:06:51 WARN DAGScheduler: Creating new stage failed due to exception 
- job: 1
.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of 
partitions
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:54)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getShuffleMapStage(DAGScheduler.scala:1
97)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit$1$1.apply(DAGScheduler.s
cala:272)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit$1$1.apply(DAGScheduler.s
cala:269)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$visit$1(DAGScheduler.scala:269)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit$1$1.apply(DAGScheduler.s
cala:274)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit$1$1.apply(DAGScheduler.s
cala:269)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$visit$1(DAGScheduler.scala:269)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit$1$1.apply(DAGScheduler.s
cala:274)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit$1$1.apply(DAGScheduler.s
cala:269)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$visit$1(DAGScheduler.scala:269)
at 
org.apache.spark.scheduler.DAGScheduler.getParentStages(DAGScheduler.scala:279)
at 
org.apache.spark.scheduler.DAGScheduler.newStage(DAGScheduler.scala:219)
at 
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:672)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1184)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-2820) Group by query not returning random values

2014-08-03 Thread Athira Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Athira Das updated SPARK-2820:
--

Comment: was deleted

(was: sqlContext.sql("SELECT id, month, AVG(marks) FROM data WHERE marks>25 
GROUP BY id, month").
 For this query the  output should be [id_1, 2, 50 ], [id_1, 3, 34], [id_2, 2, 
47].. and so on but instead of that i am getting 
the output like this

[1,2,34], [2,2,45] and so on

i am not able to get the id properly instead of that some random values are 
getting polulated)

> Group by query not returning random values
> --
>
> Key: SPARK-2820
> URL: https://issues.apache.org/jira/browse/SPARK-2820
> Project: Spark
>  Issue Type: Question
>Reporter: Athira Das
>
> sqlContext.sql("SELECT id, month, AVG(marks) FROM data WHERE marks>25 GROUP 
> BY id, month").
> For this query the output should be [id_1, 2, 50 ], [id_1, 3, 34], [id_2, 2, 
> 47].. and so on but instead of that i am getting 
> the output like this
> [1,2,34], [2,2,45] and so on
> i am not able to get the id properly instead of that some random values are 
> getting polulated



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-2820) Group by query not returning random values

2014-08-03 Thread Athira Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Athira Das reopened SPARK-2820:
---


> Group by query not returning random values
> --
>
> Key: SPARK-2820
> URL: https://issues.apache.org/jira/browse/SPARK-2820
> Project: Spark
>  Issue Type: Question
>Reporter: Athira Das
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2820) Group by query not returning random values

2014-08-03 Thread Athira Das (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084365#comment-14084365
 ] 

Athira Das commented on SPARK-2820:
---

sqlContext.sql("SELECT id, month, AVG(marks) FROM data WHERE marks>25 GROUP BY 
id, month").
 For this query the  output should be [id_1, 2, 50 ], [id_1, 3, 34], [id_2, 2, 
47].. and so on but instead of that i am getting 
the output like this

[1,2,34], [2,2,45] and so on

i am not able to get the id properly instead of that some random values are 
getting polulated

> Group by query not returning random values
> --
>
> Key: SPARK-2820
> URL: https://issues.apache.org/jira/browse/SPARK-2820
> Project: Spark
>  Issue Type: Question
>Reporter: Athira Das
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2820) Group by query not returning random values

2014-08-03 Thread Athira Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Athira Das updated SPARK-2820:
--

Description: 
sqlContext.sql("SELECT id, month, AVG(marks) FROM data WHERE marks>25 GROUP BY 
id, month").
For this query the output should be [id_1, 2, 50 ], [id_1, 3, 34], [id_2, 2, 
47].. and so on but instead of that i am getting 
the output like this
[1,2,34], [2,2,45] and so on
i am not able to get the id properly instead of that some random values are 
getting polulated

> Group by query not returning random values
> --
>
> Key: SPARK-2820
> URL: https://issues.apache.org/jira/browse/SPARK-2820
> Project: Spark
>  Issue Type: Question
>Reporter: Athira Das
>
> sqlContext.sql("SELECT id, month, AVG(marks) FROM data WHERE marks>25 GROUP 
> BY id, month").
> For this query the output should be [id_1, 2, 50 ], [id_1, 3, 34], [id_2, 2, 
> 47].. and so on but instead of that i am getting 
> the output like this
> [1,2,34], [2,2,45] and so on
> i am not able to get the id properly instead of that some random values are 
> getting polulated



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2822) Group by returning random values in SparkSQL

2014-08-03 Thread Athira Das (JIRA)

Athira Das created SPARK-2822:
-

 Summary: Group by returning random values in SparkSQL 
 Key: SPARK-2822
 URL: https://issues.apache.org/jira/browse/SPARK-2822
 Project: Spark
  Issue Type: Question
Reporter: Athira Das






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2821) Group by returning random values in Spark SQL. While running the query sqlContext.sql("SELECT id, month, AVG(marks) FROM data WHERE marks>25 GROUP BY id, month")

2014-08-03 Thread Athira Das (JIRA)

Athira Das created SPARK-2821:
-

 Summary: Group by returning random values in Spark SQL. While 
running the query sqlContext.sql("SELECT id, month, AVG(marks) FROM data WHERE 
marks>25 GROUP BY id, month")
 Key: SPARK-2821
 URL: https://issues.apache.org/jira/browse/SPARK-2821
 Project: Spark
  Issue Type: Question
Reporter: Athira Das






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-2820) Group by query not returning random values

2014-08-03 Thread Athira Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Athira Das closed SPARK-2820.
-

Resolution: Fixed

> Group by query not returning random values
> --
>
> Key: SPARK-2820
> URL: https://issues.apache.org/jira/browse/SPARK-2820
> Project: Spark
>  Issue Type: Question
>Reporter: Athira Das
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-2812) convert maven to archetype based build

2014-08-03 Thread Prashant Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084347#comment-14084347
 ] 

Prashant Sharma edited comment on SPARK-2812 at 8/4/14 6:17 AM:


What do you mean by archetype based build ?. Also why can't we just ignore the 
maven warnings and have expressions in the build names ?


was (Author: prashant_):
What do you mean by archetype based build, Can you explain what do you mean by 
it. Also why can't we just ignore the maven warnings and have expressions in 
the build names ?

> convert maven to archetype based build
> --
>
> Key: SPARK-2812
> URL: https://issues.apache.org/jira/browse/SPARK-2812
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Reporter: Anand Avati
>
> In order to support Scala 2.10 and 2.11 parallel builds.
> Build profile in pom.xml is insufficient as it is not possible to have 
> expressions/variables in artifact name of sub-modules.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2820) Group by query not returning random values

2014-08-03 Thread Athira Das (JIRA)

Athira Das created SPARK-2820:
-

 Summary: Group by query not returning random values
 Key: SPARK-2820
 URL: https://issues.apache.org/jira/browse/SPARK-2820
 Project: Spark
  Issue Type: Question
Reporter: Athira Das






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2812) convert maven to archetype based build

2014-08-03 Thread Prashant Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084347#comment-14084347
 ] 

Prashant Sharma commented on SPARK-2812:


What do you mean by archetype based build, Can you explain what do you mean by 
it. Also why can't we just ignore the maven warnings and have expressions in 
the build names ?

> convert maven to archetype based build
> --
>
> Key: SPARK-2812
> URL: https://issues.apache.org/jira/browse/SPARK-2812
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Reporter: Anand Avati
>
> In order to support Scala 2.10 and 2.11 parallel builds.
> Build profile in pom.xml is insufficient as it is not possible to have 
> expressions/variables in artifact name of sub-modules.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2818) Improve joinning RDDs that transformed from the same cached RDD

2014-08-03 Thread Lu Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lu Lu updated SPARK-2818:
-

Component/s: Spark Core
Description: 
if the joinning RDDs are originating from a same cached RDD a, the DAGScheduler 
will submit redundant stages to compute and cache the RDD a.
For example:

val edges = sc.textFile(...).cache()
val bigSrc = edges.groupByKey().filter(...)
val reversed = edges.map(edge => (edge._2, edge._1))
val bigDst = reversed.groupByKey().filter(...)
bigSrc.join(bigDst).count

The final count action will trigger two stages both to compute the edges RDD. 
It will result to two performance problerm:
(1) if the resources are sufficient, these two stages will be running 
concurrently and read the same HDFS file at the same time.
(2) if the two stages run one by one, the tasks of the latter stage can read 
the cached blocks of the edges RDD directly. But it cannot achieve 
data-locality for the latter stage because that the block location information 
are not known when submiting the stages.

> Improve joinning RDDs that transformed from the same cached RDD
> ---
>
> Key: SPARK-2818
> URL: https://issues.apache.org/jira/browse/SPARK-2818
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Lu Lu
>
> if the joinning RDDs are originating from a same cached RDD a, the 
> DAGScheduler will submit redundant stages to compute and cache the RDD a.
> For example:
> val edges = sc.textFile(...).cache()
> val bigSrc = edges.groupByKey().filter(...)
> val reversed = edges.map(edge => (edge._2, edge._1))
> val bigDst = reversed.groupByKey().filter(...)
> bigSrc.join(bigDst).count
> The final count action will trigger two stages both to compute the edges RDD. 
> It will result to two performance problerm:
> (1) if the resources are sufficient, these two stages will be running 
> concurrently and read the same HDFS file at the same time.
> (2) if the two stages run one by one, the tasks of the latter stage can read 
> the cached blocks of the edges RDD directly. But it cannot achieve 
> data-locality for the latter stage because that the block location 
> information are not known when submiting the stages.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2819) Difficult to turn on intercept with linear models

2014-08-03 Thread Sandy Ryza (JIRA)

Sandy Ryza created SPARK-2819:
-

 Summary: Difficult to turn on intercept with linear models
 Key: SPARK-2819
 URL: https://issues.apache.org/jira/browse/SPARK-2819
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Sandy Ryza


If I want to train a logistic regression model with default parameters and 
include an intercept, I can run:
val alg = new LogisticRegressionWithSGD()
alg.setIntercept(true)
alg.run(data)

but if I want to set a parameter like numIterations, I need to use
LogisticRegressionWithSGD.train(data, 50)
and have no opportunity to turn on the intercept.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2818) Improve joinning RDDs that transformed from the same cached RDD

2014-08-03 Thread Lu Lu (JIRA)

Lu Lu created SPARK-2818:


 Summary: Improve joinning RDDs that transformed from the same 
cached RDD
 Key: SPARK-2818
 URL: https://issues.apache.org/jira/browse/SPARK-2818
 Project: Spark
  Issue Type: Improvement
Reporter: Lu Lu






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2817) add "show create table" support

2014-08-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084308#comment-14084308
 ] 

Apache Spark commented on SPARK-2817:
-

User 'tianyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/1760

> add  "show create table" support
> 
>
> Key: SPARK-2817
> URL: https://issues.apache.org/jira/browse/SPARK-2817
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.0
>Reporter: Yi Tian
>Priority: Minor
>
> In spark sql component, the "show create table" syntax had been disabled.
> We thought it is a useful funciton to describe a hive table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2272) Feature scaling which standardizes the range of independent variables or features of data.

2014-08-03 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-2272.
--

   Resolution: Fixed
Fix Version/s: 1.1.0

Issue resolved by pull request 1207
[https://github.com/apache/spark/pull/1207]

> Feature scaling which standardizes the range of independent variables or 
> features of data.
> --
>
> Key: SPARK-2272
> URL: https://issues.apache.org/jira/browse/SPARK-2272
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: DB Tsai
>Assignee: DB Tsai
> Fix For: 1.1.0
>
>
> Feature scaling is a method used to standardize the range of independent 
> variables or features of data. In data processing, it is also known as data 
> normalization and is generally performed during the data preprocessing step.
> In this work, a trait called `VectorTransformer` is defined for generic 
> transformation of a vector. It contains two methods, `apply` which applies 
> transformation on a vector and `unapply` which applies inverse transformation 
> on a vector.
> There are three concrete implementations of `VectorTransformer`, and they all 
> can be easily extended with PMML transformation support. 
> 1) `VectorStandardizer` - Standardises a vector given the mean and variance. 
> Since the standardization will densify the output, the output is always in 
> dense vector format.
>  
> 2) `VectorRescaler` -  Rescales a vector into target range specified by a 
> tuple of two double values or two vectors as new target minimum and maximum. 
> Since the rescaling will substrate the minimum of each column first, the 
> output will always be in dense vector regardless of input vector type.
> 3) `VectorDivider` -  Transforms a vector by dividing a constant or diving a 
> vector with element by element basis. This transformation will preserve the 
> type of input vector without densifying the result.
> Utility helper methods are implemented for taking an input of RDD[Vector], 
> and then transformed RDD[Vector] and transformer are returned for dividing, 
> rescaling, normalization, and standardization. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2815) Compilation failed upon the hadoop version 2.0.0-cdh4.5.0

2014-08-03 Thread pengyanhong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084287#comment-14084287
 ] 

pengyanhong commented on SPARK-2815:


I changed the YarnAllocationHandler.scala file as below:
import org.apache.hadoop.yarn.api.records,ApplicationAttemptId
val amResp = allocateExecutorResources(executorsToRequest)

then compile successfully and it can work on YARN cluster, but i am not sure 
whether there are potential problems.

> Compilation failed upon the hadoop version 2.0.0-cdh4.5.0
> -
>
> Key: SPARK-2815
> URL: https://issues.apache.org/jira/browse/SPARK-2815
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
>Reporter: pengyanhong
>Assignee: Guoqiang Li
>Priority: Blocker
>
> compile fail via SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true 
> SPARK_HIVE=true sbt/sbt assembly,  finally get error message : [error] 
> (yarn-stable/compile:compile) Compilation failed, the following is the detail 
> error on console:
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:26:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.YarnClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:40:
>  not found: value YarnClient
> [error]   val yarnClient = YarnClient.createYarnClient
> [error]^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:32:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:33:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:36:
>  object util is not a member of package org.apache.hadoop.yarn.webapp
> [error] import org.apache.hadoop.yarn.webapp.util.WebAppUtils
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:64:
>  value RM_AM_MAX_ATTEMPTS is not a member of object 
> org.apache.hadoop.yarn.conf.YarnConfiguration
> [error] YarnConfiguration.RM_AM_MAX_ATTEMPTS, 
> YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS)
> [error]   ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:66:
>  not found: type AMRMClient
> [error]   private var amClient: AMRMClient[ContainerRequest] = _
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:92:
>  not found: value AMRMClient
> [error] amClient = AMRMClient.createAMRMClient()
> [error]^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:137:
>  not found: value WebAppUtils
> [error] val proxy = WebAppUtils.getProxyHostAndPort(conf)
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:40:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:618:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:596:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:577:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:410

[jira] [Created] (SPARK-2817) add "show create table" support

2014-08-03 Thread Yi Tian (JIRA)

Yi Tian created SPARK-2817:
--

 Summary: add  "show create table" support
 Key: SPARK-2817
 URL: https://issues.apache.org/jira/browse/SPARK-2817
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0
Reporter: Yi Tian
Priority: Minor


In spark sql component, the "show create table" syntax had been disabled.
We thought it is a useful funciton to describe a hive table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2816) Type-safe SQL queries

2014-08-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084257#comment-14084257
 ] 

Apache Spark commented on SPARK-2816:
-

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/1759

> Type-safe SQL queries
> -
>
> Key: SPARK-2816
> URL: https://issues.apache.org/jira/browse/SPARK-2816
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2816) Type-safe SQL queries

2014-08-03 Thread Michael Armbrust (JIRA)

Michael Armbrust created SPARK-2816:
---

 Summary: Type-safe SQL queries
 Key: SPARK-2816
 URL: https://issues.apache.org/jira/browse/SPARK-2816
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-2744) The configuration "spark.history.retainedApplications" is invalid

2014-08-03 Thread meiyoula (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

meiyoula closed SPARK-2744.
---

Resolution: Not a Problem

> The configuration "spark.history.retainedApplications" is invalid
> -
>
> Key: SPARK-2744
> URL: https://issues.apache.org/jira/browse/SPARK-2744
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: meiyoula
>  Labels: historyserver
>
> when I set it in spark-env.sh like this:export 
> SPARK_HISTORY_OPTS=$SPARK_HISTORY_OPTS" -Dspark.history.ui.port=5678 
> -Dspark.history.retainedApplications=1 ", the web of historyserver retains 
> more than one application



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2583) ConnectionManager cannot distinguish whether error occurred or not

2014-08-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084195#comment-14084195
 ] 

Apache Spark commented on SPARK-2583:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/1758

> ConnectionManager cannot distinguish whether error occurred or not
> --
>
> Key: SPARK-2583
> URL: https://issues.apache.org/jira/browse/SPARK-2583
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Critical
>
> ConnectionManager#handleMessage sent empty messages to another peer if some 
> error occurred or not in onReceiveCalback.
> {code}
>  val ackMessage = if (onReceiveCallback != null) {
> logDebug("Calling back")
> onReceiveCallback(bufferMessage, connectionManagerId)
>   } else {
> logDebug("Not calling back as callback is null")
> None
>   }
>   if (ackMessage.isDefined) {
> if (!ackMessage.get.isInstanceOf[BufferMessage]) {
>   logDebug("Response to " + bufferMessage + " is not a buffer 
> message, it is of type "
> + ackMessage.get.getClass)
> } else if (!ackMessage.get.asInstanceOf[BufferMessage].hasAckId) {
>   logDebug("Response to " + bufferMessage + " does not have ack 
> id set")
>   ackMessage.get.asInstanceOf[BufferMessage].ackId = 
> bufferMessage.id
> }
>   }
> // We have no way to tell peer whether error occurred or not
>   sendMessage(connectionManagerId, ackMessage.getOrElse {
> Message.createBufferMessage(bufferMessage.id)
>   })
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2810) update scala-maven-plugin to version 3.2.0

2014-08-03 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2810.


  Resolution: Fixed
   Fix Version/s: 1.1.0
Target Version/s: 1.1.0

Fixed by:
https://github.com/apache/spark/pull/1711

> update scala-maven-plugin to version 3.2.0
> --
>
> Key: SPARK-2810
> URL: https://issues.apache.org/jira/browse/SPARK-2810
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Reporter: Anand Avati
>Assignee: Anand Avati
> Fix For: 1.1.0
>
>
> Needed for Scala 2.11 'compiler-interface'



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2810) update scala-maven-plugin to version 3.2.0

2014-08-03 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2810:
---

Assignee: Anand Avati

> update scala-maven-plugin to version 3.2.0
> --
>
> Key: SPARK-2810
> URL: https://issues.apache.org/jira/browse/SPARK-2810
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Spark Core
>Reporter: Anand Avati
>Assignee: Anand Avati
> Fix For: 1.1.0
>
>
> Needed for Scala 2.11 'compiler-interface'



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1981) Add AWS Kinesis streaming support

2014-08-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084164#comment-14084164
 ] 

Apache Spark commented on SPARK-1981:
-

User 'cfregly' has created a pull request for this issue:
https://github.com/apache/spark/pull/1757

> Add AWS Kinesis streaming support
> -
>
> Key: SPARK-1981
> URL: https://issues.apache.org/jira/browse/SPARK-1981
> Project: Spark
>  Issue Type: New Feature
>  Components: Streaming
>Reporter: Chris Fregly
>Assignee: Chris Fregly
> Fix For: 1.1.0
>
>
> Add AWS Kinesis support to Spark Streaming.
> Initial discussion occured here:  https://github.com/apache/spark/pull/223
> I discussed this with Parviz from AWS recently and we agreed that I would 
> take this over.
> Look for a new PR that takes into account all the feedback from the earlier 
> PR including spark-1.0-compliant implementation, AWS-license-aware build 
> support, tests, comments, and style guide compliance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-1740) Pyspark cancellation kills unrelated pyspark workers

2014-08-03 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-1740.
---

   Resolution: Fixed
Fix Version/s: 1.1.0

> Pyspark cancellation kills unrelated pyspark workers
> 
>
> Key: SPARK-1740
> URL: https://issues.apache.org/jira/browse/SPARK-1740
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.0.0
>Reporter: Aaron Davidson
>Assignee: Davies Liu
>Priority: Critical
> Fix For: 1.1.0
>
>
> PySpark cancellation calls SparkEnv#destroyPythonWorker. Since there is one 
> python worker per process, this would seem like a sensible thing to do. 
> Unfortunately, this method actually destroys a python daemon, and all 
> associated workers, which generally means that we can cause failures in 
> unrelated Pyspark jobs.
> The severity of this bug is limited by the fact that the Pyspark daemon is 
> easily recreated, so the tasks will succeed after being restarted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2360) CSV import to SchemaRDDs

2014-08-03 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-2360:


Target Version/s: 1.2.0  (was: 1.1.0)

> CSV import to SchemaRDDs
> 
>
> Key: SPARK-2360
> URL: https://issues.apache.org/jira/browse/SPARK-2360
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Hossein Falaki
>Priority: Minor
>
> I think the first step it to design the interface that we want to present to 
> users.  Mostly this is defining options when importing.  Off the top of my 
> head:
> - What is the separator?
> - Provide column names or infer them from the first row.
> - how to handle multiple files with possibly different schemas
> - do we have a method to let users specify the datatypes of the columns or 
> are they just strings?
> - what types of quoting / escaping do we want to support?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2783) Basic support for analyze in HiveContext

2014-08-03 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2783.
-

   Resolution: Fixed
Fix Version/s: 1.1.0

> Basic support for analyze in HiveContext
> 
>
> Key: SPARK-2783
> URL: https://issues.apache.org/jira/browse/SPARK-2783
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Yin Huai
>Priority: Blocker
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2360) CSV import to SchemaRDDs

2014-08-03 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-2360:


Priority: Major  (was: Minor)

> CSV import to SchemaRDDs
> 
>
> Key: SPARK-2360
> URL: https://issues.apache.org/jira/browse/SPARK-2360
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Hossein Falaki
>
> I think the first step it to design the interface that we want to present to 
> users.  Mostly this is defining options when importing.  Off the top of my 
> head:
> - What is the separator?
> - Provide column names or infer them from the first row.
> - how to handle multiple files with possibly different schemas
> - do we have a method to let users specify the datatypes of the columns or 
> are they just strings?
> - what types of quoting / escaping do we want to support?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2752) spark sql cli should not exit when get a exception

2014-08-03 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2752.
-

  Resolution: Fixed
Target Version/s: 1.1.0

> spark sql cli should not exit when get a exception
> --
>
> Key: SPARK-2752
> URL: https://issues.apache.org/jira/browse/SPARK-2752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.0
>Reporter: wangfei
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2784) Make language configurable using SQLConf instead of hql/sql functions

2014-08-03 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2784.
-

   Resolution: Fixed
Fix Version/s: 1.1.0

> Make language configurable using SQLConf instead of hql/sql functions
> -
>
> Key: SPARK-2784
> URL: https://issues.apache.org/jira/browse/SPARK-2784
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Blocker
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2814) HiveThriftServer throws NPE when executing native commands

2014-08-03 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2814.
-

   Resolution: Fixed
Fix Version/s: 1.1.0
 Assignee: Cheng Lian

> HiveThriftServer throws NPE when executing native commands
> --
>
> Key: SPARK-2814
> URL: https://issues.apache.org/jira/browse/SPARK-2814
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
> Fix For: 1.1.0
>
>
> After [PR #1686|https://github.com/apache/spark/pull/1686], 
> {{HiveThriftServer2}} throws exception when executing native commands.
> The reason is that initialization of {{HiveContext.sessionState.out}} and 
> {{HiveContext.sessionState.err}} were made lazy, while {{HiveThriftServer2}} 
> uses an overriden version of {{HiveContext}} that doesn't know how to 
> initialize these two streams. When {{HiveContext.runHive}} tries to write to 
> {{HiveContext.sessionState.out}}, an NPE is throw.
> Reproduction steps:
> # Start HiveThriftServer2
> # Connect to it via beeline
> # Execute `set;`
> Exception thrown:
> {code}
> ==
> HIVE FAILURE OUTPUT
> ==
> ==
> END HIVE FAILURE OUTPUT
> ==
> 14/08/03 21:30:55 ERROR SparkSQLOperationManager: Error executing query:
> java.lang.NullPointerException
> at 
> org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:210)
> at 
> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:173)
> at org.apache.spark.sql.hive.HiveContext.set(HiveContext.scala:144)
> at 
> org.apache.spark.sql.execution.SetCommand.sideEffectResult$lzycompute(commands.scala:59)
> at 
> org.apache.spark.sql.execution.SetCommand.sideEffectResult(commands.scala:50)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1997) Update breeze to version 0.8.1

2014-08-03 Thread Xiangrui Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084057#comment-14084057
 ] 

Xiangrui Meng commented on SPARK-1997:
--

It's fine within Spark. If we add breeze-0.8.1 with scalalogging-2.1.1, users 
may have trouble using Spark with their own library if it depends on 
scalalogging-1.0.1. This is why we removed scalalogging dependency from Spark 
SQL, so there is no reason to add it back, no matter which version it is. David 
already merged the PR that removes scalalogging from breeze. We are now waiting 
for him to help cut a new release of breeze, without scalalogging.

> Update breeze to version 0.8.1
> --
>
> Key: SPARK-1997
> URL: https://issues.apache.org/jira/browse/SPARK-1997
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
> Fix For: 1.1.0
>
>
> {{breeze 0.7}} does not support {{scala 2.11}} .



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2197) Spark invoke DecisionTree by Java

2014-08-03 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-2197.
--

   Resolution: Fixed
Fix Version/s: 1.1.0

Issue resolved by pull request 1740
[https://github.com/apache/spark/pull/1740]

> Spark invoke DecisionTree by Java
> -
>
> Key: SPARK-2197
> URL: https://issues.apache.org/jira/browse/SPARK-2197
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Reporter: wulin
>Assignee: Joseph K. Bradley
> Fix For: 1.1.0
>
>
> Strategy strategy = new Strategy(Algo.Classification(), new Impurity() {
>   @Override
>   public double calculate(double arg0, double arg1, 
> double arg2) {
>   return Gini.calculate(arg0, arg1, arg2);
>   }
>   @Override
>   public double calculate(double arg0, double arg1) {
>   return Gini.calculate(arg0, arg1);
>   }
>   }, 5, 100, QuantileStrategy.Sort(), null, 256);
>   DecisionTree decisionTree = new DecisionTree(strategy);
>   final DecisionTreeModel decisionTreeModel = 
> decisionTree.train(labeledPoints.rdd());
> i try to run it on spark, but find an error on the console:
> java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to 
> [Lorg.apache.spark.mllib.regression.LabeledPoint;
>   at 
> org.apache.spark.mllib.tree.DecisionTree$.findSplitsBins(DecisionTree.scala:990)
>   at org.apache.spark.mllib.tree.DecisionTree.train(DecisionTree.scala:56)
>   at 
> org.project.modules.spark.java.SparkDecisionTree.main(SparkDecisionTree.java:75)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> i view source code, find  
> val numFeatures = input.take(1)(0).features.size
> this is a problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2246) Add user-data option to EC2 scripts

2014-08-03 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2246.


   Resolution: Fixed
Fix Version/s: 1.1.0

Issue resolved by pull request 1186
[https://github.com/apache/spark/pull/1186]

> Add user-data option to EC2 scripts
> ---
>
> Key: SPARK-2246
> URL: https://issues.apache.org/jira/browse/SPARK-2246
> Project: Spark
>  Issue Type: Improvement
>  Components: EC2
>Reporter: Allan Douglas R. de Oliveira
>Assignee: Allan Douglas R. de Oliveira
> Fix For: 1.1.0
>
>
> EC2 servers can use an "user-data" script for custom startup/initialization 
> of machines. The EC2 scripts should provide an option to set this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2246) Add user-data option to EC2 scripts

2014-08-03 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2246:
---

Assignee: Allan Douglas R. de Oliveira

> Add user-data option to EC2 scripts
> ---
>
> Key: SPARK-2246
> URL: https://issues.apache.org/jira/browse/SPARK-2246
> Project: Spark
>  Issue Type: Improvement
>  Components: EC2
>Reporter: Allan Douglas R. de Oliveira
>Assignee: Allan Douglas R. de Oliveira
>
> EC2 servers can use an "user-data" script for custom startup/initialization 
> of machines. The EC2 scripts should provide an option to set this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2712) Add a small note that mvn "package" must happen before "test"

2014-08-03 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2712.


Resolution: Fixed

Issue resolved by pull request 1615
[https://github.com/apache/spark/pull/1615]

> Add a small note that mvn "package" must happen before "test"
> -
>
> Key: SPARK-2712
> URL: https://issues.apache.org/jira/browse/SPARK-2712
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 0.9.1, 1.0.0, 1.1.1
> Environment: all
>Reporter: Stephen Boesch
>Assignee: Stephen Boesch
>Priority: Trivial
>  Labels: documentation
> Fix For: 1.1.0
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Add to the building-with-maven.md:
> Requirement: build packages before running tests
> Tests must be run AFTER the "package" target has already been executed. The 
> following is an example of a correct (build, test) sequence:
> mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive clean package
> mvn -Pyarn -Phadoop-2.3 -Phive test
> BTW Reynold Xin requested this tiny doc improvement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2815) Compilation failed upon the hadoop version 2.0.0-cdh4.5.0

2014-08-03 Thread Guoqiang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084034#comment-14084034
 ] 

Guoqiang Li commented on SPARK-2815:


Currently {{yarn-alpha}} does not support version {{2.0.0-cdh4.5.0}},  but  
seems to support  version {{2.0.0-cdh4.2.0}}  
{{2.0.0-cdh4.5.0}} get following error:
{noformat}
[ERROR] 
/Users/witgo/work/code/java/spark/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:36:
 object AMResponse is not a member of package org.apache.hadoop.yarn.api.records
[ERROR] import org.apache.hadoop.yarn.api.records.{AMResponse, 
ApplicationAttemptId}
[ERROR]^
[ERROR] 
/Users/witgo/work/code/java/spark/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:114:
 value getAMResponse is not a member of 
org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse
[ERROR] val amResp = 
allocateExecutorResources(executorsToRequest).getAMResponse
{noformat}


> Compilation failed upon the hadoop version 2.0.0-cdh4.5.0
> -
>
> Key: SPARK-2815
> URL: https://issues.apache.org/jira/browse/SPARK-2815
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
>Reporter: pengyanhong
>Assignee: Guoqiang Li
>Priority: Blocker
>
> compile fail via SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true 
> SPARK_HIVE=true sbt/sbt assembly,  finally get error message : [error] 
> (yarn-stable/compile:compile) Compilation failed, the following is the detail 
> error on console:
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:26:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.YarnClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:40:
>  not found: value YarnClient
> [error]   val yarnClient = YarnClient.createYarnClient
> [error]^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:32:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:33:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:36:
>  object util is not a member of package org.apache.hadoop.yarn.webapp
> [error] import org.apache.hadoop.yarn.webapp.util.WebAppUtils
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:64:
>  value RM_AM_MAX_ATTEMPTS is not a member of object 
> org.apache.hadoop.yarn.conf.YarnConfiguration
> [error] YarnConfiguration.RM_AM_MAX_ATTEMPTS, 
> YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS)
> [error]   ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:66:
>  not found: type AMRMClient
> [error]   private var amClient: AMRMClient[ContainerRequest] = _
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:92:
>  not found: value AMRMClient
> [error] amClient = AMRMClient.createAMRMClient()
> [error]^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:137:
>  not found: value WebAppUtils
> [error] val proxy = WebAppUtils.getProxyHostAndPort(conf)
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:40:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:618:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy

[jira] [Commented] (SPARK-1335) Also increase perm gen / code cache for scalatest when invoked via Maven build

2014-08-03 Thread Guoqiang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084026#comment-14084026
 ] 

Guoqiang Li commented on SPARK-1335:


The problem also appeared in branch 1.1. The following command fails. 
{{mvn  -Pyarn-alpha -Phive -Dhadoop.version=2.0.0-cdh4.5.0 -DskipTests  
package}} .
 I'm on Java 6 / OSX 10.9.4

> Also increase perm gen / code cache for scalatest when invoked via Maven build
> --
>
> Key: SPARK-1335
> URL: https://issues.apache.org/jira/browse/SPARK-1335
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 0.9.0
>Reporter: Sean Owen
>Assignee: Sean Owen
> Fix For: 1.0.0
>
>
> I am observing build failures when the Maven build reaches tests in the new 
> SQL components. (I'm on Java 7 / OSX 10.9). The failure is the usual 
> complaint from scala, that it's out of permgen space, or that JIT out of code 
> cache space.
> I see that various build scripts increase these both for SBT. This change 
> simply adds these settings to scalatest's arguments. Works for me and seems a 
> bit more consistent.
> (In the PR I'm going to tack on some other little changes too -- see PR.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1981) Add AWS Kinesis streaming support

2014-08-03 Thread Nicholas Chammas (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084025#comment-14084025
 ] 

Nicholas Chammas commented on SPARK-1981:
-

Word. Thanks for the clarification!

> Add AWS Kinesis streaming support
> -
>
> Key: SPARK-1981
> URL: https://issues.apache.org/jira/browse/SPARK-1981
> Project: Spark
>  Issue Type: New Feature
>  Components: Streaming
>Reporter: Chris Fregly
>Assignee: Chris Fregly
> Fix For: 1.1.0
>
>
> Add AWS Kinesis support to Spark Streaming.
> Initial discussion occured here:  https://github.com/apache/spark/pull/223
> I discussed this with Parviz from AWS recently and we agreed that I would 
> take this over.
> Look for a new PR that takes into account all the feedback from the earlier 
> PR including spark-1.0-compliant implementation, AWS-license-aware build 
> support, tests, comments, and style guide compliance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2815) Compilation failed upon the hadoop version 2.0.0-cdh4.5.0

2014-08-03 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084009#comment-14084009
 ] 

Sean Owen commented on SPARK-2815:
--

Your build command is out of date. SPARK_HADOOP_VERSION et al are deprecated. 
You should build with Maven, but SBT should work too. [~gq]'s command looks 
correct. See http://spark.apache.org/docs/latest/building-with-maven.html which 
documents this. 


> Compilation failed upon the hadoop version 2.0.0-cdh4.5.0
> -
>
> Key: SPARK-2815
> URL: https://issues.apache.org/jira/browse/SPARK-2815
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
>Reporter: pengyanhong
>Assignee: Guoqiang Li
>Priority: Blocker
>
> compile fail via SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true 
> SPARK_HIVE=true sbt/sbt assembly,  finally get error message : [error] 
> (yarn-stable/compile:compile) Compilation failed, the following is the detail 
> error on console:
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:26:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.YarnClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:40:
>  not found: value YarnClient
> [error]   val yarnClient = YarnClient.createYarnClient
> [error]^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:32:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:33:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:36:
>  object util is not a member of package org.apache.hadoop.yarn.webapp
> [error] import org.apache.hadoop.yarn.webapp.util.WebAppUtils
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:64:
>  value RM_AM_MAX_ATTEMPTS is not a member of object 
> org.apache.hadoop.yarn.conf.YarnConfiguration
> [error] YarnConfiguration.RM_AM_MAX_ATTEMPTS, 
> YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS)
> [error]   ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:66:
>  not found: type AMRMClient
> [error]   private var amClient: AMRMClient[ContainerRequest] = _
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:92:
>  not found: value AMRMClient
> [error] amClient = AMRMClient.createAMRMClient()
> [error]^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:137:
>  not found: value WebAppUtils
> [error] val proxy = WebAppUtils.getProxyHostAndPort(conf)
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:40:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:618:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:596:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:577:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:410:
>  value CONTAINER_ID is not a member of ob

[jira] [Comment Edited] (SPARK-2815) Compilation failed upon the hadoop version 2.0.0-cdh4.5.0

2014-08-03 Thread Guoqiang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084006#comment-14084006
 ] 

Guoqiang Li edited comment on SPARK-2815 at 8/3/14 3:10 PM:


[~pengyanhong] You can try this {{./sbt/sbt clean assembly -Pyarn-alpha -Phive 
-Dhadoop.version=2.0.0-cdh4.5.0}}



was (Author: gq):
[~pengyanhong] You can try this first {{./sbt/sbt clean assembly -Pyarn-alpha 
-Phive -Dhadoop.version=2.0.0-cdh4.5.0}}


> Compilation failed upon the hadoop version 2.0.0-cdh4.5.0
> -
>
> Key: SPARK-2815
> URL: https://issues.apache.org/jira/browse/SPARK-2815
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
>Reporter: pengyanhong
>Assignee: Guoqiang Li
>Priority: Blocker
>
> compile fail via SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true 
> SPARK_HIVE=true sbt/sbt assembly,  finally get error message : [error] 
> (yarn-stable/compile:compile) Compilation failed, the following is the detail 
> error on console:
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:26:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.YarnClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:40:
>  not found: value YarnClient
> [error]   val yarnClient = YarnClient.createYarnClient
> [error]^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:32:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:33:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:36:
>  object util is not a member of package org.apache.hadoop.yarn.webapp
> [error] import org.apache.hadoop.yarn.webapp.util.WebAppUtils
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:64:
>  value RM_AM_MAX_ATTEMPTS is not a member of object 
> org.apache.hadoop.yarn.conf.YarnConfiguration
> [error] YarnConfiguration.RM_AM_MAX_ATTEMPTS, 
> YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS)
> [error]   ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:66:
>  not found: type AMRMClient
> [error]   private var amClient: AMRMClient[ContainerRequest] = _
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:92:
>  not found: value AMRMClient
> [error] amClient = AMRMClient.createAMRMClient()
> [error]^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:137:
>  not found: value WebAppUtils
> [error] val proxy = WebAppUtils.getProxyHostAndPort(conf)
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:40:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:618:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:596:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:577:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.sc

[jira] [Commented] (SPARK-2815) Compilation failed upon the hadoop version 2.0.0-cdh4.5.0

2014-08-03 Thread Guoqiang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084006#comment-14084006
 ] 

Guoqiang Li commented on SPARK-2815:


[~pengyanhong] You can try this first {{./sbt/sbt clean assembly -Pyarn-alpha 
-Phive -Dhadoop.version=2.0.0-cdh4.5.0}}


> Compilation failed upon the hadoop version 2.0.0-cdh4.5.0
> -
>
> Key: SPARK-2815
> URL: https://issues.apache.org/jira/browse/SPARK-2815
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
>Reporter: pengyanhong
>Assignee: Guoqiang Li
>Priority: Blocker
>
> compile fail via SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true 
> SPARK_HIVE=true sbt/sbt assembly,  finally get error message : [error] 
> (yarn-stable/compile:compile) Compilation failed, the following is the detail 
> error on console:
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:26:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.YarnClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:40:
>  not found: value YarnClient
> [error]   val yarnClient = YarnClient.createYarnClient
> [error]^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:32:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:33:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:36:
>  object util is not a member of package org.apache.hadoop.yarn.webapp
> [error] import org.apache.hadoop.yarn.webapp.util.WebAppUtils
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:64:
>  value RM_AM_MAX_ATTEMPTS is not a member of object 
> org.apache.hadoop.yarn.conf.YarnConfiguration
> [error] YarnConfiguration.RM_AM_MAX_ATTEMPTS, 
> YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS)
> [error]   ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:66:
>  not found: type AMRMClient
> [error]   private var amClient: AMRMClient[ContainerRequest] = _
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:92:
>  not found: value AMRMClient
> [error] amClient = AMRMClient.createAMRMClient()
> [error]^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:137:
>  not found: value WebAppUtils
> [error] val proxy = WebAppUtils.getProxyHostAndPort(conf)
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:40:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:618:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:596:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:577:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:410:
>  value CONTAINER_ID is not a member of object 
> org.apache.hadoop.yarn.api.ApplicationConstants.Environment
> [error] val containerIdString = 
> System.getenv(Applica

[jira] [Commented] (SPARK-2815) Compilation failed upon the hadoop version 2.0.0-cdh4.5.0

2014-08-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084000#comment-14084000
 ] 

Apache Spark commented on SPARK-2815:
-

User 'witgo' has created a pull request for this issue:
https://github.com/apache/spark/pull/1754

> Compilation failed upon the hadoop version 2.0.0-cdh4.5.0
> -
>
> Key: SPARK-2815
> URL: https://issues.apache.org/jira/browse/SPARK-2815
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
>Reporter: pengyanhong
>Assignee: Guoqiang Li
>Priority: Blocker
>
> compile fail via SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true 
> SPARK_HIVE=true sbt/sbt assembly,  finally get error message : [error] 
> (yarn-stable/compile:compile) Compilation failed, the following is the detail 
> error on console:
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:26:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.YarnClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:40:
>  not found: value YarnClient
> [error]   val yarnClient = YarnClient.createYarnClient
> [error]^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:32:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:33:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:36:
>  object util is not a member of package org.apache.hadoop.yarn.webapp
> [error] import org.apache.hadoop.yarn.webapp.util.WebAppUtils
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:64:
>  value RM_AM_MAX_ATTEMPTS is not a member of object 
> org.apache.hadoop.yarn.conf.YarnConfiguration
> [error] YarnConfiguration.RM_AM_MAX_ATTEMPTS, 
> YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS)
> [error]   ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:66:
>  not found: type AMRMClient
> [error]   private var amClient: AMRMClient[ContainerRequest] = _
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:92:
>  not found: value AMRMClient
> [error] amClient = AMRMClient.createAMRMClient()
> [error]^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:137:
>  not found: value WebAppUtils
> [error] val proxy = WebAppUtils.getProxyHostAndPort(conf)
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:40:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:618:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:596:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:577:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:410:
>  value CONTAINER_ID is not a member of object 
> org.apache.hadoop.yarn.api.ApplicationConstants.Environment
> [error] val containerIdString = 
> System.getenv(ApplicationConstants.Environ

[jira] [Commented] (SPARK-2815) Compilation failed upon the hadoop version 2.0.0-cdh4.5.0

2014-08-03 Thread Guoqiang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083998#comment-14083998
 ] 

Guoqiang Li commented on SPARK-2815:


I also encountered this bug.
 PRed: https://github.com/apache/spark/pull/1754

> Compilation failed upon the hadoop version 2.0.0-cdh4.5.0
> -
>
> Key: SPARK-2815
> URL: https://issues.apache.org/jira/browse/SPARK-2815
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.1.0
>Reporter: pengyanhong
>Assignee: Guoqiang Li
>Priority: Blocker
>
> compile fail via SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true 
> SPARK_HIVE=true sbt/sbt assembly,  finally get error message : [error] 
> (yarn-stable/compile:compile) Compilation failed, the following is the detail 
> error on console:
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:26:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.YarnClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:40:
>  not found: value YarnClient
> [error]   val yarnClient = YarnClient.createYarnClient
> [error]^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:32:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:33:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:36:
>  object util is not a member of package org.apache.hadoop.yarn.webapp
> [error] import org.apache.hadoop.yarn.webapp.util.WebAppUtils
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:64:
>  value RM_AM_MAX_ATTEMPTS is not a member of object 
> org.apache.hadoop.yarn.conf.YarnConfiguration
> [error] YarnConfiguration.RM_AM_MAX_ATTEMPTS, 
> YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS)
> [error]   ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:66:
>  not found: type AMRMClient
> [error]   private var amClient: AMRMClient[ContainerRequest] = _
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:92:
>  not found: value AMRMClient
> [error] amClient = AMRMClient.createAMRMClient()
> [error]^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:137:
>  not found: value WebAppUtils
> [error] val proxy = WebAppUtils.getProxyHostAndPort(conf)
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:40:
>  object api is not a member of package org.apache.hadoop.yarn.client
> [error] import org.apache.hadoop.yarn.client.api.AMRMClient
> [error]  ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:618:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:596:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:577:
>  not found: type AMRMClient
> [error]   amClient: AMRMClient[ContainerRequest],
> [error] ^
> [error] 
> /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:410:
>  value CONTAINER_ID is not a member of object 
> org.apache.hadoop.yarn.api.ApplicationConstants.Environment
> [error] val containerIdString = 
> System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name

[jira] [Created] (SPARK-2815) Compilation failed upon the hadoop version 2.0.0-cdh4.5.0

2014-08-03 Thread pengyanhong (JIRA)

pengyanhong created SPARK-2815:
--

 Summary: Compilation failed upon the hadoop version 2.0.0-cdh4.5.0
 Key: SPARK-2815
 URL: https://issues.apache.org/jira/browse/SPARK-2815
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.1.0
Reporter: pengyanhong
Priority: Blocker


compile fail via SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true 
SPARK_HIVE=true sbt/sbt assembly,  finally get error message : [error] 
(yarn-stable/compile:compile) Compilation failed, the following is the detail 
error on console:
[error] 
/Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:26:
 object api is not a member of package org.apache.hadoop.yarn.client
[error] import org.apache.hadoop.yarn.client.api.YarnClient
[error]  ^
[error] 
/Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:40:
 not found: value YarnClient
[error]   val yarnClient = YarnClient.createYarnClient
[error]^
[error] 
/Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:32:
 object api is not a member of package org.apache.hadoop.yarn.client
[error] import org.apache.hadoop.yarn.client.api.AMRMClient
[error]  ^
[error] 
/Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:33:
 object api is not a member of package org.apache.hadoop.yarn.client
[error] import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest
[error]  ^
[error] 
/Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:36:
 object util is not a member of package org.apache.hadoop.yarn.webapp
[error] import org.apache.hadoop.yarn.webapp.util.WebAppUtils
[error]  ^
[error] 
/Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:64:
 value RM_AM_MAX_ATTEMPTS is not a member of object 
org.apache.hadoop.yarn.conf.YarnConfiguration
[error] YarnConfiguration.RM_AM_MAX_ATTEMPTS, 
YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS)
[error]   ^
[error] 
/Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:66:
 not found: type AMRMClient
[error]   private var amClient: AMRMClient[ContainerRequest] = _
[error] ^
[error] 
/Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:92:
 not found: value AMRMClient
[error] amClient = AMRMClient.createAMRMClient()
[error]^
[error] 
/Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:137:
 not found: value WebAppUtils
[error] val proxy = WebAppUtils.getProxyHostAndPort(conf)
[error] ^
[error] 
/Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:40:
 object api is not a member of package org.apache.hadoop.yarn.client
[error] import org.apache.hadoop.yarn.client.api.AMRMClient
[error]  ^
[error] 
/Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:618:
 not found: type AMRMClient
[error]   amClient: AMRMClient[ContainerRequest],
[error] ^
[error] 
/Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:596:
 not found: type AMRMClient
[error]   amClient: AMRMClient[ContainerRequest],
[error] ^
[error] 
/Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:577:
 not found: type AMRMClient
[error]   amClient: AMRMClient[ContainerRequest],
[error] ^
[error] 
/Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:410:
 value CONTAINER_ID is not a member of object 
org.apache.hadoop.yarn.api.ApplicationConstants.Environment
[error] val containerIdString = 
System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
[error] 
   ^
[error] 
/Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:128:
 value setTokens is not a member of 
org.apache.hadoop.yarn.api.records.ContainerLaunchContext
[error] amContainer.setTokens(ByteBuffer.wrap(dob.getData()))
[error] ^
[error] 
/Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ExecutorLauncher.

[jira] [Updated] (SPARK-2814) HiveThriftServer throws NPE when executing native commands

2014-08-03 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-2814:
--

Description: 
After [PR #1686|https://github.com/apache/spark/pull/1686], 
{{HiveThriftServer2}} throws exception when executing native commands.

The reason is that initialization of {{HiveContext.sessionState.out}} and 
{{HiveContext.sessionState.err}} were made lazy, while {{HiveThriftServer2}} 
uses an overriden version of {{HiveContext}} that doesn't know how to 
initialize these two streams. When {{HiveContext.runHive}} tries to write to 
{{HiveContext.sessionState.out}}, an NPE is throw.

Reproduction steps:

# Start HiveThriftServer2
# Connect to it via beeline
# Execute `set;`

Exception thrown:
{code}
==
HIVE FAILURE OUTPUT
==


==
END HIVE FAILURE OUTPUT
==

14/08/03 21:30:55 ERROR SparkSQLOperationManager: Error executing query:
java.lang.NullPointerException
at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:210)
at 
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:173)
at org.apache.spark.sql.hive.HiveContext.set(HiveContext.scala:144)
at 
org.apache.spark.sql.execution.SetCommand.sideEffectResult$lzycompute(commands.scala:59)
at 
org.apache.spark.sql.execution.SetCommand.sideEffectResult(commands.scala:50)
...
{code}

  was:
After [PR #1686|https://github.com/apache/spark/pull/1686], 
{{HiveThriftServer2}} throws exception when executing native commands.

The reason is that initialization of {{HiveContext.sessionState.out}} and 
{{HiveContext.sessionState.err}} were made lazy, while {{HiveThriftServer2}} 
uses an overriden version of {{HiveContext}} that doesn't know how to 
initialize these two streams.

Reproduction steps:

# Start HiveThriftServer2
# Connect to it via beeline
# Execute `set;`

Exception thrown:
{code}
==
HIVE FAILURE OUTPUT
==


==
END HIVE FAILURE OUTPUT
==

14/08/03 21:30:55 ERROR SparkSQLOperationManager: Error executing query:
java.lang.NullPointerException
at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:210)
at 
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:173)
at org.apache.spark.sql.hive.HiveContext.set(HiveContext.scala:144)
at 
org.apache.spark.sql.execution.SetCommand.sideEffectResult$lzycompute(commands.scala:59)
at 
org.apache.spark.sql.execution.SetCommand.sideEffectResult(commands.scala:50)
...
{code}


> HiveThriftServer throws NPE when executing native commands
> --
>
> Key: SPARK-2814
> URL: https://issues.apache.org/jira/browse/SPARK-2814
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Cheng Lian
>
> After [PR #1686|https://github.com/apache/spark/pull/1686], 
> {{HiveThriftServer2}} throws exception when executing native commands.
> The reason is that initialization of {{HiveContext.sessionState.out}} and 
> {{HiveContext.sessionState.err}} were made lazy, while {{HiveThriftServer2}} 
> uses an overriden version of {{HiveContext}} that doesn't know how to 
> initialize these two streams. When {{HiveContext.runHive}} tries to write to 
> {{HiveContext.sessionState.out}}, an NPE is throw.
> Reproduction steps:
> # Start HiveThriftServer2
> # Connect to it via beeline
> # Execute `set;`
> Exception thrown:
> {code}
> ==
> HIVE FAILURE OUTPUT
> ==
> ==
> END HIVE FAILURE OUTPUT
> ==
> 14/08/03 21:30:55 ERROR SparkSQLOperationManager: Error executing query:
> java.lang.NullPointerException
> at 
> org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:210)
> at 
> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:173)
> at org.apache.spark.sql.hive.HiveContext.set(HiveContext.scala:144)
> at 
> org.apache.spark.sql.execution.SetCommand.sideEffectResult$lzycompute(commands.scala:59)
> at 
> org.apache.spark.sql.execution.SetCommand.sideEffectResult(commands.scala:50)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2814) HiveThriftServer throws NPE when executing native commands

2014-08-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083974#comment-14083974
 ] 

Apache Spark commented on SPARK-2814:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/1753

> HiveThriftServer throws NPE when executing native commands
> --
>
> Key: SPARK-2814
> URL: https://issues.apache.org/jira/browse/SPARK-2814
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Cheng Lian
>
> After [PR #1686|https://github.com/apache/spark/pull/1686], 
> {{HiveThriftServer2}} throws exception when executing native commands.
> The reason is that initialization of {{HiveContext.sessionState.out}} and 
> {{HiveContext.sessionState.err}} were made lazy, while {{HiveThriftServer2}} 
> uses an overriden version of {{HiveContext}} that doesn't know how to 
> initialize these two streams.
> Reproduction steps:
> # Start HiveThriftServer2
> # Connect to it via beeline
> # Execute `set;`
> Exception thrown:
> {code}
> ==
> HIVE FAILURE OUTPUT
> ==
> ==
> END HIVE FAILURE OUTPUT
> ==
> 14/08/03 21:30:55 ERROR SparkSQLOperationManager: Error executing query:
> java.lang.NullPointerException
> at 
> org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:210)
> at 
> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:173)
> at org.apache.spark.sql.hive.HiveContext.set(HiveContext.scala:144)
> at 
> org.apache.spark.sql.execution.SetCommand.sideEffectResult$lzycompute(commands.scala:59)
> at 
> org.apache.spark.sql.execution.SetCommand.sideEffectResult(commands.scala:50)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-2814) HiveThriftServer throws NPE when executing native commands

2014-08-03 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-2814:
-

 Summary: HiveThriftServer throws NPE when executing native commands
 Key: SPARK-2814
 URL: https://issues.apache.org/jira/browse/SPARK-2814
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Cheng Lian


After [PR #1686|https://github.com/apache/spark/pull/1686], 
{{HiveThriftServer2}} throws exception when executing native commands.

The reason is that initialization of {{HiveContext.sessionState.out}} and 
{{HiveContext.sessionState.err}} were made lazy, while {{HiveThriftServer2}} 
uses an overriden version of {{HiveContext}} that doesn't know how to 
initialize these two streams.

Reproduction steps:

# Start HiveThriftServer2
# Connect to it via beeline
# Execute `set;`

Exception thrown:
{code}
==
HIVE FAILURE OUTPUT
==


==
END HIVE FAILURE OUTPUT
==

14/08/03 21:30:55 ERROR SparkSQLOperationManager: Error executing query:
java.lang.NullPointerException
at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:210)
at 
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:173)
at org.apache.spark.sql.hive.HiveContext.set(HiveContext.scala:144)
at 
org.apache.spark.sql.execution.SetCommand.sideEffectResult$lzycompute(commands.scala:59)
at 
org.apache.spark.sql.execution.SetCommand.sideEffectResult(commands.scala:50)
...
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2803) add Kafka stream feature for fetch messages from specified starting offset position

2014-08-03 Thread pengyanhong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083973#comment-14083973
 ] 

pengyanhong commented on SPARK-2803:


resolved this issue in the pull request #1602

> add Kafka stream feature for fetch messages from specified starting offset 
> position
> ---
>
> Key: SPARK-2803
> URL: https://issues.apache.org/jira/browse/SPARK-2803
> Project: Spark
>  Issue Type: New Feature
>  Components: Input/Output
>Reporter: pengyanhong
>  Labels: patch
>
> There are some use cases that we want to fetch message from specified offset 
> position, as below:
> * replay messages
> * deal with transaction
> * skip bulk incorrect messages
> * random fetch message according to index



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1449) Please delete old releases from mirroring system

2014-08-03 Thread Sebb (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083942#comment-14083942
 ] 

Sebb commented on SPARK-1449:
-

No need to check out the directory tree (which is large), you can remove files 
directly from SVN using 
"svn delete (del, remove, rm)"

By default all members of the Spark PMC [1] will have karma to update the 
dist/release/spark tree.
In particular whoever uploaded the last release should have ensured that 
previous releases were tidied up a few days after uploading the latest release 
...

The PMC can vote to ask Infra if they wish the dist/release/spark tree to be 
updateable by non-PMC members as well.

[1] http://people.apache.org/committers-by-project.html#spark-pmc

> Please delete old releases from mirroring system
> 
>
> Key: SPARK-1449
> URL: https://issues.apache.org/jira/browse/SPARK-1449
> Project: Spark
>  Issue Type: Task
>Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.9.1
>Reporter: Sebb
>
> To reduce the load on the ASF mirrors, projects are required to delete old 
> releases [1]
> Please can you remove all non-current releases?
> Thanks!
> [Note that older releases are always available from the ASF archive server]
> Any links to older releases on download pages should first be adjusted to 
> point to the archive server.
> [1] http://www.apache.org/dev/release.html#when-to-archive



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1449) Please delete old releases from mirroring system

2014-08-03 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083933#comment-14083933
 ] 

Sean Owen commented on SPARK-1449:
--

Sebb, is this just a matter of "svn co 
https://dist.apache.org/repos/dist/release/spark/"; and svn rm'ing the 0.9.1 and 
1.0.0 releases?
I'd do it but I don't have access. I think.

[~pwendell] maybe this can be a step in the release process if not already? It 
may well be and these older ones were just missed last time.

> Please delete old releases from mirroring system
> 
>
> Key: SPARK-1449
> URL: https://issues.apache.org/jira/browse/SPARK-1449
> Project: Spark
>  Issue Type: Task
>Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.9.1
>Reporter: Sebb
>
> To reduce the load on the ASF mirrors, projects are required to delete old 
> releases [1]
> Please can you remove all non-current releases?
> Thanks!
> [Note that older releases are always available from the ASF archive server]
> Any links to older releases on download pages should first be adjusted to 
> point to the archive server.
> [1] http://www.apache.org/dev/release.html#when-to-archive



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1997) Update breeze to version 0.8.1

2014-08-03 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083929#comment-14083929
 ] 

Sean Owen commented on SPARK-1997:
--

Was scalalogging a problem per se? the issue was that Spark used a different 
verison, but now it doesn't use it at all, and there is no conflict. Unless I 
misunderstand, it would be fine to use breeze 0.8.1 + Scala 2.10 in the current 
Spark code.

> Update breeze to version 0.8.1
> --
>
> Key: SPARK-1997
> URL: https://issues.apache.org/jira/browse/SPARK-1997
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Guoqiang Li
>Assignee: Guoqiang Li
> Fix For: 1.1.0
>
>
> {{breeze 0.7}} does not support {{scala 2.11}} .



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1022) Add unit tests for kafka streaming

2014-08-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083928#comment-14083928
 ] 

Apache Spark commented on SPARK-1022:
-

User 'jerryshao' has created a pull request for this issue:
https://github.com/apache/spark/pull/1751

> Add unit tests for kafka streaming
> --
>
> Key: SPARK-1022
> URL: https://issues.apache.org/jira/browse/SPARK-1022
> Project: Spark
>  Issue Type: Bug
>Reporter: Patrick Wendell
>Assignee: Saisai Shao
>
> It would be nice if we could add unit tests to verify elements of kafka's 
> stream. Right now we do integration tests only which makes it hard to upgrade 
> versions of kafka. The place to start here would be to look at how kafka 
> tests itself and see if the functionality can be exposed to third party users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

55 matches

Mail list logo