[jira] [Resolved] (SPARK-2510) word2vec: Distributed Representation of Words
[ https://issues.apache.org/jira/browse/SPARK-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-2510. -- Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1719 [https://github.com/apache/spark/pull/1719] > word2vec: Distributed Representation of Words > - > > Key: SPARK-2510 > URL: https://issues.apache.org/jira/browse/SPARK-2510 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Liquan Pei >Assignee: Liquan Pei > Fix For: 1.1.0 > > Original Estimate: 672h > Remaining Estimate: 672h > > We would like to add parallel implementation of word2vec to MLlib. word2vec > finds distributed representation of words through training of large data > sets. We will focus on skip-gram model and hierarchical softmax in our > initial implementation. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2823) GraphX jobs throw IllegalArgumentException
Lu Lu created SPARK-2823: Summary: GraphX jobs throw IllegalArgumentException Key: SPARK-2823 URL: https://issues.apache.org/jira/browse/SPARK-2823 Project: Spark Issue Type: Bug Components: GraphX Reporter: Lu Lu If the users set “spark.default.parallelism” and the value is different with the EdgeRDD partition number, GraphX jobs will throw IllegalArgumentException: 14/07/26 21:06:51 WARN DAGScheduler: Creating new stage failed due to exception - job: 1 .lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions at org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:60) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:54) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getShuffleMapStage(DAGScheduler.scala:1 97) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit$1$1.apply(DAGScheduler.s cala:272) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit$1$1.apply(DAGScheduler.s cala:269) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$visit$1(DAGScheduler.scala:269) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit$1$1.apply(DAGScheduler.s cala:274) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit$1$1.apply(DAGScheduler.s cala:269) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$visit$1(DAGScheduler.scala:269) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit$1$1.apply(DAGScheduler.s cala:274) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit$1$1.apply(DAGScheduler.s cala:269) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$visit$1(DAGScheduler.scala:269) at org.apache.spark.scheduler.DAGScheduler.getParentStages(DAGScheduler.scala:279) at org.apache.spark.scheduler.DAGScheduler.newStage(DAGScheduler.scala:219) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:672) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1184) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-2820) Group by query not returning random values
[ https://issues.apache.org/jira/browse/SPARK-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Athira Das updated SPARK-2820: -- Comment: was deleted (was: sqlContext.sql("SELECT id, month, AVG(marks) FROM data WHERE marks>25 GROUP BY id, month"). For this query the output should be [id_1, 2, 50 ], [id_1, 3, 34], [id_2, 2, 47].. and so on but instead of that i am getting the output like this [1,2,34], [2,2,45] and so on i am not able to get the id properly instead of that some random values are getting polulated) > Group by query not returning random values > -- > > Key: SPARK-2820 > URL: https://issues.apache.org/jira/browse/SPARK-2820 > Project: Spark > Issue Type: Question >Reporter: Athira Das > > sqlContext.sql("SELECT id, month, AVG(marks) FROM data WHERE marks>25 GROUP > BY id, month"). > For this query the output should be [id_1, 2, 50 ], [id_1, 3, 34], [id_2, 2, > 47].. and so on but instead of that i am getting > the output like this > [1,2,34], [2,2,45] and so on > i am not able to get the id properly instead of that some random values are > getting polulated -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-2820) Group by query not returning random values
[ https://issues.apache.org/jira/browse/SPARK-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Athira Das reopened SPARK-2820: --- > Group by query not returning random values > -- > > Key: SPARK-2820 > URL: https://issues.apache.org/jira/browse/SPARK-2820 > Project: Spark > Issue Type: Question >Reporter: Athira Das > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2820) Group by query not returning random values
[ https://issues.apache.org/jira/browse/SPARK-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084365#comment-14084365 ] Athira Das commented on SPARK-2820: --- sqlContext.sql("SELECT id, month, AVG(marks) FROM data WHERE marks>25 GROUP BY id, month"). For this query the output should be [id_1, 2, 50 ], [id_1, 3, 34], [id_2, 2, 47].. and so on but instead of that i am getting the output like this [1,2,34], [2,2,45] and so on i am not able to get the id properly instead of that some random values are getting polulated > Group by query not returning random values > -- > > Key: SPARK-2820 > URL: https://issues.apache.org/jira/browse/SPARK-2820 > Project: Spark > Issue Type: Question >Reporter: Athira Das > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2820) Group by query not returning random values
[ https://issues.apache.org/jira/browse/SPARK-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Athira Das updated SPARK-2820: -- Description: sqlContext.sql("SELECT id, month, AVG(marks) FROM data WHERE marks>25 GROUP BY id, month"). For this query the output should be [id_1, 2, 50 ], [id_1, 3, 34], [id_2, 2, 47].. and so on but instead of that i am getting the output like this [1,2,34], [2,2,45] and so on i am not able to get the id properly instead of that some random values are getting polulated > Group by query not returning random values > -- > > Key: SPARK-2820 > URL: https://issues.apache.org/jira/browse/SPARK-2820 > Project: Spark > Issue Type: Question >Reporter: Athira Das > > sqlContext.sql("SELECT id, month, AVG(marks) FROM data WHERE marks>25 GROUP > BY id, month"). > For this query the output should be [id_1, 2, 50 ], [id_1, 3, 34], [id_2, 2, > 47].. and so on but instead of that i am getting > the output like this > [1,2,34], [2,2,45] and so on > i am not able to get the id properly instead of that some random values are > getting polulated -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2822) Group by returning random values in SparkSQL
Athira Das created SPARK-2822: - Summary: Group by returning random values in SparkSQL Key: SPARK-2822 URL: https://issues.apache.org/jira/browse/SPARK-2822 Project: Spark Issue Type: Question Reporter: Athira Das -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2821) Group by returning random values in Spark SQL. While running the query sqlContext.sql("SELECT id, month, AVG(marks) FROM data WHERE marks>25 GROUP BY id, month")
Athira Das created SPARK-2821: - Summary: Group by returning random values in Spark SQL. While running the query sqlContext.sql("SELECT id, month, AVG(marks) FROM data WHERE marks>25 GROUP BY id, month") Key: SPARK-2821 URL: https://issues.apache.org/jira/browse/SPARK-2821 Project: Spark Issue Type: Question Reporter: Athira Das -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-2820) Group by query not returning random values
[ https://issues.apache.org/jira/browse/SPARK-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Athira Das closed SPARK-2820. - Resolution: Fixed > Group by query not returning random values > -- > > Key: SPARK-2820 > URL: https://issues.apache.org/jira/browse/SPARK-2820 > Project: Spark > Issue Type: Question >Reporter: Athira Das > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-2812) convert maven to archetype based build
[ https://issues.apache.org/jira/browse/SPARK-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084347#comment-14084347 ] Prashant Sharma edited comment on SPARK-2812 at 8/4/14 6:17 AM: What do you mean by archetype based build ?. Also why can't we just ignore the maven warnings and have expressions in the build names ? was (Author: prashant_): What do you mean by archetype based build, Can you explain what do you mean by it. Also why can't we just ignore the maven warnings and have expressions in the build names ? > convert maven to archetype based build > -- > > Key: SPARK-2812 > URL: https://issues.apache.org/jira/browse/SPARK-2812 > Project: Spark > Issue Type: Sub-task > Components: Build, Spark Core >Reporter: Anand Avati > > In order to support Scala 2.10 and 2.11 parallel builds. > Build profile in pom.xml is insufficient as it is not possible to have > expressions/variables in artifact name of sub-modules. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2820) Group by query not returning random values
Athira Das created SPARK-2820: - Summary: Group by query not returning random values Key: SPARK-2820 URL: https://issues.apache.org/jira/browse/SPARK-2820 Project: Spark Issue Type: Question Reporter: Athira Das -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2812) convert maven to archetype based build
[ https://issues.apache.org/jira/browse/SPARK-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084347#comment-14084347 ] Prashant Sharma commented on SPARK-2812: What do you mean by archetype based build, Can you explain what do you mean by it. Also why can't we just ignore the maven warnings and have expressions in the build names ? > convert maven to archetype based build > -- > > Key: SPARK-2812 > URL: https://issues.apache.org/jira/browse/SPARK-2812 > Project: Spark > Issue Type: Sub-task > Components: Build, Spark Core >Reporter: Anand Avati > > In order to support Scala 2.10 and 2.11 parallel builds. > Build profile in pom.xml is insufficient as it is not possible to have > expressions/variables in artifact name of sub-modules. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2818) Improve joinning RDDs that transformed from the same cached RDD
[ https://issues.apache.org/jira/browse/SPARK-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lu Lu updated SPARK-2818: - Component/s: Spark Core Description: if the joinning RDDs are originating from a same cached RDD a, the DAGScheduler will submit redundant stages to compute and cache the RDD a. For example: val edges = sc.textFile(...).cache() val bigSrc = edges.groupByKey().filter(...) val reversed = edges.map(edge => (edge._2, edge._1)) val bigDst = reversed.groupByKey().filter(...) bigSrc.join(bigDst).count The final count action will trigger two stages both to compute the edges RDD. It will result to two performance problerm: (1) if the resources are sufficient, these two stages will be running concurrently and read the same HDFS file at the same time. (2) if the two stages run one by one, the tasks of the latter stage can read the cached blocks of the edges RDD directly. But it cannot achieve data-locality for the latter stage because that the block location information are not known when submiting the stages. > Improve joinning RDDs that transformed from the same cached RDD > --- > > Key: SPARK-2818 > URL: https://issues.apache.org/jira/browse/SPARK-2818 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Lu Lu > > if the joinning RDDs are originating from a same cached RDD a, the > DAGScheduler will submit redundant stages to compute and cache the RDD a. > For example: > val edges = sc.textFile(...).cache() > val bigSrc = edges.groupByKey().filter(...) > val reversed = edges.map(edge => (edge._2, edge._1)) > val bigDst = reversed.groupByKey().filter(...) > bigSrc.join(bigDst).count > The final count action will trigger two stages both to compute the edges RDD. > It will result to two performance problerm: > (1) if the resources are sufficient, these two stages will be running > concurrently and read the same HDFS file at the same time. > (2) if the two stages run one by one, the tasks of the latter stage can read > the cached blocks of the edges RDD directly. But it cannot achieve > data-locality for the latter stage because that the block location > information are not known when submiting the stages. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2819) Difficult to turn on intercept with linear models
Sandy Ryza created SPARK-2819: - Summary: Difficult to turn on intercept with linear models Key: SPARK-2819 URL: https://issues.apache.org/jira/browse/SPARK-2819 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Sandy Ryza If I want to train a logistic regression model with default parameters and include an intercept, I can run: val alg = new LogisticRegressionWithSGD() alg.setIntercept(true) alg.run(data) but if I want to set a parameter like numIterations, I need to use LogisticRegressionWithSGD.train(data, 50) and have no opportunity to turn on the intercept. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2818) Improve joinning RDDs that transformed from the same cached RDD
Lu Lu created SPARK-2818: Summary: Improve joinning RDDs that transformed from the same cached RDD Key: SPARK-2818 URL: https://issues.apache.org/jira/browse/SPARK-2818 Project: Spark Issue Type: Improvement Reporter: Lu Lu -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2817) add "show create table" support
[ https://issues.apache.org/jira/browse/SPARK-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084308#comment-14084308 ] Apache Spark commented on SPARK-2817: - User 'tianyi' has created a pull request for this issue: https://github.com/apache/spark/pull/1760 > add "show create table" support > > > Key: SPARK-2817 > URL: https://issues.apache.org/jira/browse/SPARK-2817 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.0.0 >Reporter: Yi Tian >Priority: Minor > > In spark sql component, the "show create table" syntax had been disabled. > We thought it is a useful funciton to describe a hive table. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2272) Feature scaling which standardizes the range of independent variables or features of data.
[ https://issues.apache.org/jira/browse/SPARK-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-2272. -- Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1207 [https://github.com/apache/spark/pull/1207] > Feature scaling which standardizes the range of independent variables or > features of data. > -- > > Key: SPARK-2272 > URL: https://issues.apache.org/jira/browse/SPARK-2272 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: DB Tsai >Assignee: DB Tsai > Fix For: 1.1.0 > > > Feature scaling is a method used to standardize the range of independent > variables or features of data. In data processing, it is also known as data > normalization and is generally performed during the data preprocessing step. > In this work, a trait called `VectorTransformer` is defined for generic > transformation of a vector. It contains two methods, `apply` which applies > transformation on a vector and `unapply` which applies inverse transformation > on a vector. > There are three concrete implementations of `VectorTransformer`, and they all > can be easily extended with PMML transformation support. > 1) `VectorStandardizer` - Standardises a vector given the mean and variance. > Since the standardization will densify the output, the output is always in > dense vector format. > > 2) `VectorRescaler` - Rescales a vector into target range specified by a > tuple of two double values or two vectors as new target minimum and maximum. > Since the rescaling will substrate the minimum of each column first, the > output will always be in dense vector regardless of input vector type. > 3) `VectorDivider` - Transforms a vector by dividing a constant or diving a > vector with element by element basis. This transformation will preserve the > type of input vector without densifying the result. > Utility helper methods are implemented for taking an input of RDD[Vector], > and then transformed RDD[Vector] and transformer are returned for dividing, > rescaling, normalization, and standardization. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2815) Compilation failed upon the hadoop version 2.0.0-cdh4.5.0
[ https://issues.apache.org/jira/browse/SPARK-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084287#comment-14084287 ] pengyanhong commented on SPARK-2815: I changed the YarnAllocationHandler.scala file as below: import org.apache.hadoop.yarn.api.records,ApplicationAttemptId val amResp = allocateExecutorResources(executorsToRequest) then compile successfully and it can work on YARN cluster, but i am not sure whether there are potential problems. > Compilation failed upon the hadoop version 2.0.0-cdh4.5.0 > - > > Key: SPARK-2815 > URL: https://issues.apache.org/jira/browse/SPARK-2815 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.1.0 >Reporter: pengyanhong >Assignee: Guoqiang Li >Priority: Blocker > > compile fail via SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true > SPARK_HIVE=true sbt/sbt assembly, finally get error message : [error] > (yarn-stable/compile:compile) Compilation failed, the following is the detail > error on console: > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:26: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.YarnClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:40: > not found: value YarnClient > [error] val yarnClient = YarnClient.createYarnClient > [error]^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:32: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:33: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:36: > object util is not a member of package org.apache.hadoop.yarn.webapp > [error] import org.apache.hadoop.yarn.webapp.util.WebAppUtils > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:64: > value RM_AM_MAX_ATTEMPTS is not a member of object > org.apache.hadoop.yarn.conf.YarnConfiguration > [error] YarnConfiguration.RM_AM_MAX_ATTEMPTS, > YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS) > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:66: > not found: type AMRMClient > [error] private var amClient: AMRMClient[ContainerRequest] = _ > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:92: > not found: value AMRMClient > [error] amClient = AMRMClient.createAMRMClient() > [error]^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:137: > not found: value WebAppUtils > [error] val proxy = WebAppUtils.getProxyHostAndPort(conf) > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:40: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:618: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:596: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:577: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:410
[jira] [Created] (SPARK-2817) add "show create table" support
Yi Tian created SPARK-2817: -- Summary: add "show create table" support Key: SPARK-2817 URL: https://issues.apache.org/jira/browse/SPARK-2817 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: Yi Tian Priority: Minor In spark sql component, the "show create table" syntax had been disabled. We thought it is a useful funciton to describe a hive table. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2816) Type-safe SQL queries
[ https://issues.apache.org/jira/browse/SPARK-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084257#comment-14084257 ] Apache Spark commented on SPARK-2816: - User 'marmbrus' has created a pull request for this issue: https://github.com/apache/spark/pull/1759 > Type-safe SQL queries > - > > Key: SPARK-2816 > URL: https://issues.apache.org/jira/browse/SPARK-2816 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Michael Armbrust >Assignee: Michael Armbrust > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2816) Type-safe SQL queries
Michael Armbrust created SPARK-2816: --- Summary: Type-safe SQL queries Key: SPARK-2816 URL: https://issues.apache.org/jira/browse/SPARK-2816 Project: Spark Issue Type: New Feature Components: SQL Reporter: Michael Armbrust Assignee: Michael Armbrust -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-2744) The configuration "spark.history.retainedApplications" is invalid
[ https://issues.apache.org/jira/browse/SPARK-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] meiyoula closed SPARK-2744. --- Resolution: Not a Problem > The configuration "spark.history.retainedApplications" is invalid > - > > Key: SPARK-2744 > URL: https://issues.apache.org/jira/browse/SPARK-2744 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: meiyoula > Labels: historyserver > > when I set it in spark-env.sh like this:export > SPARK_HISTORY_OPTS=$SPARK_HISTORY_OPTS" -Dspark.history.ui.port=5678 > -Dspark.history.retainedApplications=1 ", the web of historyserver retains > more than one application -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2583) ConnectionManager cannot distinguish whether error occurred or not
[ https://issues.apache.org/jira/browse/SPARK-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084195#comment-14084195 ] Apache Spark commented on SPARK-2583: - User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/1758 > ConnectionManager cannot distinguish whether error occurred or not > -- > > Key: SPARK-2583 > URL: https://issues.apache.org/jira/browse/SPARK-2583 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Critical > > ConnectionManager#handleMessage sent empty messages to another peer if some > error occurred or not in onReceiveCalback. > {code} > val ackMessage = if (onReceiveCallback != null) { > logDebug("Calling back") > onReceiveCallback(bufferMessage, connectionManagerId) > } else { > logDebug("Not calling back as callback is null") > None > } > if (ackMessage.isDefined) { > if (!ackMessage.get.isInstanceOf[BufferMessage]) { > logDebug("Response to " + bufferMessage + " is not a buffer > message, it is of type " > + ackMessage.get.getClass) > } else if (!ackMessage.get.asInstanceOf[BufferMessage].hasAckId) { > logDebug("Response to " + bufferMessage + " does not have ack > id set") > ackMessage.get.asInstanceOf[BufferMessage].ackId = > bufferMessage.id > } > } > // We have no way to tell peer whether error occurred or not > sendMessage(connectionManagerId, ackMessage.getOrElse { > Message.createBufferMessage(bufferMessage.id) > }) > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2810) update scala-maven-plugin to version 3.2.0
[ https://issues.apache.org/jira/browse/SPARK-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2810. Resolution: Fixed Fix Version/s: 1.1.0 Target Version/s: 1.1.0 Fixed by: https://github.com/apache/spark/pull/1711 > update scala-maven-plugin to version 3.2.0 > -- > > Key: SPARK-2810 > URL: https://issues.apache.org/jira/browse/SPARK-2810 > Project: Spark > Issue Type: Sub-task > Components: Build, Spark Core >Reporter: Anand Avati >Assignee: Anand Avati > Fix For: 1.1.0 > > > Needed for Scala 2.11 'compiler-interface' -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2810) update scala-maven-plugin to version 3.2.0
[ https://issues.apache.org/jira/browse/SPARK-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2810: --- Assignee: Anand Avati > update scala-maven-plugin to version 3.2.0 > -- > > Key: SPARK-2810 > URL: https://issues.apache.org/jira/browse/SPARK-2810 > Project: Spark > Issue Type: Sub-task > Components: Build, Spark Core >Reporter: Anand Avati >Assignee: Anand Avati > Fix For: 1.1.0 > > > Needed for Scala 2.11 'compiler-interface' -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1981) Add AWS Kinesis streaming support
[ https://issues.apache.org/jira/browse/SPARK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084164#comment-14084164 ] Apache Spark commented on SPARK-1981: - User 'cfregly' has created a pull request for this issue: https://github.com/apache/spark/pull/1757 > Add AWS Kinesis streaming support > - > > Key: SPARK-1981 > URL: https://issues.apache.org/jira/browse/SPARK-1981 > Project: Spark > Issue Type: New Feature > Components: Streaming >Reporter: Chris Fregly >Assignee: Chris Fregly > Fix For: 1.1.0 > > > Add AWS Kinesis support to Spark Streaming. > Initial discussion occured here: https://github.com/apache/spark/pull/223 > I discussed this with Parviz from AWS recently and we agreed that I would > take this over. > Look for a new PR that takes into account all the feedback from the earlier > PR including spark-1.0-compliant implementation, AWS-license-aware build > support, tests, comments, and style guide compliance. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1740) Pyspark cancellation kills unrelated pyspark workers
[ https://issues.apache.org/jira/browse/SPARK-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-1740. --- Resolution: Fixed Fix Version/s: 1.1.0 > Pyspark cancellation kills unrelated pyspark workers > > > Key: SPARK-1740 > URL: https://issues.apache.org/jira/browse/SPARK-1740 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.0.0 >Reporter: Aaron Davidson >Assignee: Davies Liu >Priority: Critical > Fix For: 1.1.0 > > > PySpark cancellation calls SparkEnv#destroyPythonWorker. Since there is one > python worker per process, this would seem like a sensible thing to do. > Unfortunately, this method actually destroys a python daemon, and all > associated workers, which generally means that we can cause failures in > unrelated Pyspark jobs. > The severity of this bug is limited by the fact that the Pyspark daemon is > easily recreated, so the tasks will succeed after being restarted. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2360) CSV import to SchemaRDDs
[ https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2360: Target Version/s: 1.2.0 (was: 1.1.0) > CSV import to SchemaRDDs > > > Key: SPARK-2360 > URL: https://issues.apache.org/jira/browse/SPARK-2360 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Michael Armbrust >Assignee: Hossein Falaki >Priority: Minor > > I think the first step it to design the interface that we want to present to > users. Mostly this is defining options when importing. Off the top of my > head: > - What is the separator? > - Provide column names or infer them from the first row. > - how to handle multiple files with possibly different schemas > - do we have a method to let users specify the datatypes of the columns or > are they just strings? > - what types of quoting / escaping do we want to support? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2783) Basic support for analyze in HiveContext
[ https://issues.apache.org/jira/browse/SPARK-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-2783. - Resolution: Fixed Fix Version/s: 1.1.0 > Basic support for analyze in HiveContext > > > Key: SPARK-2783 > URL: https://issues.apache.org/jira/browse/SPARK-2783 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Michael Armbrust >Assignee: Yin Huai >Priority: Blocker > Fix For: 1.1.0 > > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2360) CSV import to SchemaRDDs
[ https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2360: Priority: Major (was: Minor) > CSV import to SchemaRDDs > > > Key: SPARK-2360 > URL: https://issues.apache.org/jira/browse/SPARK-2360 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Michael Armbrust >Assignee: Hossein Falaki > > I think the first step it to design the interface that we want to present to > users. Mostly this is defining options when importing. Off the top of my > head: > - What is the separator? > - Provide column names or infer them from the first row. > - how to handle multiple files with possibly different schemas > - do we have a method to let users specify the datatypes of the columns or > are they just strings? > - what types of quoting / escaping do we want to support? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2752) spark sql cli should not exit when get a exception
[ https://issues.apache.org/jira/browse/SPARK-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-2752. - Resolution: Fixed Target Version/s: 1.1.0 > spark sql cli should not exit when get a exception > -- > > Key: SPARK-2752 > URL: https://issues.apache.org/jira/browse/SPARK-2752 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.0.0 >Reporter: wangfei > Fix For: 1.1.0 > > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2784) Make language configurable using SQLConf instead of hql/sql functions
[ https://issues.apache.org/jira/browse/SPARK-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-2784. - Resolution: Fixed Fix Version/s: 1.1.0 > Make language configurable using SQLConf instead of hql/sql functions > - > > Key: SPARK-2784 > URL: https://issues.apache.org/jira/browse/SPARK-2784 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Michael Armbrust >Assignee: Michael Armbrust >Priority: Blocker > Fix For: 1.1.0 > > -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2814) HiveThriftServer throws NPE when executing native commands
[ https://issues.apache.org/jira/browse/SPARK-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-2814. - Resolution: Fixed Fix Version/s: 1.1.0 Assignee: Cheng Lian > HiveThriftServer throws NPE when executing native commands > -- > > Key: SPARK-2814 > URL: https://issues.apache.org/jira/browse/SPARK-2814 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > Fix For: 1.1.0 > > > After [PR #1686|https://github.com/apache/spark/pull/1686], > {{HiveThriftServer2}} throws exception when executing native commands. > The reason is that initialization of {{HiveContext.sessionState.out}} and > {{HiveContext.sessionState.err}} were made lazy, while {{HiveThriftServer2}} > uses an overriden version of {{HiveContext}} that doesn't know how to > initialize these two streams. When {{HiveContext.runHive}} tries to write to > {{HiveContext.sessionState.out}}, an NPE is throw. > Reproduction steps: > # Start HiveThriftServer2 > # Connect to it via beeline > # Execute `set;` > Exception thrown: > {code} > == > HIVE FAILURE OUTPUT > == > == > END HIVE FAILURE OUTPUT > == > 14/08/03 21:30:55 ERROR SparkSQLOperationManager: Error executing query: > java.lang.NullPointerException > at > org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:210) > at > org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:173) > at org.apache.spark.sql.hive.HiveContext.set(HiveContext.scala:144) > at > org.apache.spark.sql.execution.SetCommand.sideEffectResult$lzycompute(commands.scala:59) > at > org.apache.spark.sql.execution.SetCommand.sideEffectResult(commands.scala:50) > ... > {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1997) Update breeze to version 0.8.1
[ https://issues.apache.org/jira/browse/SPARK-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084057#comment-14084057 ] Xiangrui Meng commented on SPARK-1997: -- It's fine within Spark. If we add breeze-0.8.1 with scalalogging-2.1.1, users may have trouble using Spark with their own library if it depends on scalalogging-1.0.1. This is why we removed scalalogging dependency from Spark SQL, so there is no reason to add it back, no matter which version it is. David already merged the PR that removes scalalogging from breeze. We are now waiting for him to help cut a new release of breeze, without scalalogging. > Update breeze to version 0.8.1 > -- > > Key: SPARK-1997 > URL: https://issues.apache.org/jira/browse/SPARK-1997 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Guoqiang Li >Assignee: Guoqiang Li > Fix For: 1.1.0 > > > {{breeze 0.7}} does not support {{scala 2.11}} . -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2197) Spark invoke DecisionTree by Java
[ https://issues.apache.org/jira/browse/SPARK-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-2197. -- Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1740 [https://github.com/apache/spark/pull/1740] > Spark invoke DecisionTree by Java > - > > Key: SPARK-2197 > URL: https://issues.apache.org/jira/browse/SPARK-2197 > Project: Spark > Issue Type: Bug > Components: MLlib >Reporter: wulin >Assignee: Joseph K. Bradley > Fix For: 1.1.0 > > > Strategy strategy = new Strategy(Algo.Classification(), new Impurity() { > @Override > public double calculate(double arg0, double arg1, > double arg2) { > return Gini.calculate(arg0, arg1, arg2); > } > @Override > public double calculate(double arg0, double arg1) { > return Gini.calculate(arg0, arg1); > } > }, 5, 100, QuantileStrategy.Sort(), null, 256); > DecisionTree decisionTree = new DecisionTree(strategy); > final DecisionTreeModel decisionTreeModel = > decisionTree.train(labeledPoints.rdd()); > i try to run it on spark, but find an error on the console: > java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to > [Lorg.apache.spark.mllib.regression.LabeledPoint; > at > org.apache.spark.mllib.tree.DecisionTree$.findSplitsBins(DecisionTree.scala:990) > at org.apache.spark.mllib.tree.DecisionTree.train(DecisionTree.scala:56) > at > org.project.modules.spark.java.SparkDecisionTree.main(SparkDecisionTree.java:75) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > i view source code, find > val numFeatures = input.take(1)(0).features.size > this is a problem. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2246) Add user-data option to EC2 scripts
[ https://issues.apache.org/jira/browse/SPARK-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2246. Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1186 [https://github.com/apache/spark/pull/1186] > Add user-data option to EC2 scripts > --- > > Key: SPARK-2246 > URL: https://issues.apache.org/jira/browse/SPARK-2246 > Project: Spark > Issue Type: Improvement > Components: EC2 >Reporter: Allan Douglas R. de Oliveira >Assignee: Allan Douglas R. de Oliveira > Fix For: 1.1.0 > > > EC2 servers can use an "user-data" script for custom startup/initialization > of machines. The EC2 scripts should provide an option to set this. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2246) Add user-data option to EC2 scripts
[ https://issues.apache.org/jira/browse/SPARK-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2246: --- Assignee: Allan Douglas R. de Oliveira > Add user-data option to EC2 scripts > --- > > Key: SPARK-2246 > URL: https://issues.apache.org/jira/browse/SPARK-2246 > Project: Spark > Issue Type: Improvement > Components: EC2 >Reporter: Allan Douglas R. de Oliveira >Assignee: Allan Douglas R. de Oliveira > > EC2 servers can use an "user-data" script for custom startup/initialization > of machines. The EC2 scripts should provide an option to set this. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2712) Add a small note that mvn "package" must happen before "test"
[ https://issues.apache.org/jira/browse/SPARK-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2712. Resolution: Fixed Issue resolved by pull request 1615 [https://github.com/apache/spark/pull/1615] > Add a small note that mvn "package" must happen before "test" > - > > Key: SPARK-2712 > URL: https://issues.apache.org/jira/browse/SPARK-2712 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 0.9.1, 1.0.0, 1.1.1 > Environment: all >Reporter: Stephen Boesch >Assignee: Stephen Boesch >Priority: Trivial > Labels: documentation > Fix For: 1.1.0 > > Original Estimate: 0h > Remaining Estimate: 0h > > Add to the building-with-maven.md: > Requirement: build packages before running tests > Tests must be run AFTER the "package" target has already been executed. The > following is an example of a correct (build, test) sequence: > mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive clean package > mvn -Pyarn -Phadoop-2.3 -Phive test > BTW Reynold Xin requested this tiny doc improvement. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2815) Compilation failed upon the hadoop version 2.0.0-cdh4.5.0
[ https://issues.apache.org/jira/browse/SPARK-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084034#comment-14084034 ] Guoqiang Li commented on SPARK-2815: Currently {{yarn-alpha}} does not support version {{2.0.0-cdh4.5.0}}, but seems to support version {{2.0.0-cdh4.2.0}} {{2.0.0-cdh4.5.0}} get following error: {noformat} [ERROR] /Users/witgo/work/code/java/spark/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:36: object AMResponse is not a member of package org.apache.hadoop.yarn.api.records [ERROR] import org.apache.hadoop.yarn.api.records.{AMResponse, ApplicationAttemptId} [ERROR]^ [ERROR] /Users/witgo/work/code/java/spark/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:114: value getAMResponse is not a member of org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse [ERROR] val amResp = allocateExecutorResources(executorsToRequest).getAMResponse {noformat} > Compilation failed upon the hadoop version 2.0.0-cdh4.5.0 > - > > Key: SPARK-2815 > URL: https://issues.apache.org/jira/browse/SPARK-2815 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.1.0 >Reporter: pengyanhong >Assignee: Guoqiang Li >Priority: Blocker > > compile fail via SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true > SPARK_HIVE=true sbt/sbt assembly, finally get error message : [error] > (yarn-stable/compile:compile) Compilation failed, the following is the detail > error on console: > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:26: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.YarnClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:40: > not found: value YarnClient > [error] val yarnClient = YarnClient.createYarnClient > [error]^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:32: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:33: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:36: > object util is not a member of package org.apache.hadoop.yarn.webapp > [error] import org.apache.hadoop.yarn.webapp.util.WebAppUtils > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:64: > value RM_AM_MAX_ATTEMPTS is not a member of object > org.apache.hadoop.yarn.conf.YarnConfiguration > [error] YarnConfiguration.RM_AM_MAX_ATTEMPTS, > YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS) > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:66: > not found: type AMRMClient > [error] private var amClient: AMRMClient[ContainerRequest] = _ > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:92: > not found: value AMRMClient > [error] amClient = AMRMClient.createAMRMClient() > [error]^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:137: > not found: value WebAppUtils > [error] val proxy = WebAppUtils.getProxyHostAndPort(conf) > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:40: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:618: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy
[jira] [Commented] (SPARK-1335) Also increase perm gen / code cache for scalatest when invoked via Maven build
[ https://issues.apache.org/jira/browse/SPARK-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084026#comment-14084026 ] Guoqiang Li commented on SPARK-1335: The problem also appeared in branch 1.1. The following command fails. {{mvn -Pyarn-alpha -Phive -Dhadoop.version=2.0.0-cdh4.5.0 -DskipTests package}} . I'm on Java 6 / OSX 10.9.4 > Also increase perm gen / code cache for scalatest when invoked via Maven build > -- > > Key: SPARK-1335 > URL: https://issues.apache.org/jira/browse/SPARK-1335 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 0.9.0 >Reporter: Sean Owen >Assignee: Sean Owen > Fix For: 1.0.0 > > > I am observing build failures when the Maven build reaches tests in the new > SQL components. (I'm on Java 7 / OSX 10.9). The failure is the usual > complaint from scala, that it's out of permgen space, or that JIT out of code > cache space. > I see that various build scripts increase these both for SBT. This change > simply adds these settings to scalatest's arguments. Works for me and seems a > bit more consistent. > (In the PR I'm going to tack on some other little changes too -- see PR.) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1981) Add AWS Kinesis streaming support
[ https://issues.apache.org/jira/browse/SPARK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084025#comment-14084025 ] Nicholas Chammas commented on SPARK-1981: - Word. Thanks for the clarification! > Add AWS Kinesis streaming support > - > > Key: SPARK-1981 > URL: https://issues.apache.org/jira/browse/SPARK-1981 > Project: Spark > Issue Type: New Feature > Components: Streaming >Reporter: Chris Fregly >Assignee: Chris Fregly > Fix For: 1.1.0 > > > Add AWS Kinesis support to Spark Streaming. > Initial discussion occured here: https://github.com/apache/spark/pull/223 > I discussed this with Parviz from AWS recently and we agreed that I would > take this over. > Look for a new PR that takes into account all the feedback from the earlier > PR including spark-1.0-compliant implementation, AWS-license-aware build > support, tests, comments, and style guide compliance. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2815) Compilation failed upon the hadoop version 2.0.0-cdh4.5.0
[ https://issues.apache.org/jira/browse/SPARK-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084009#comment-14084009 ] Sean Owen commented on SPARK-2815: -- Your build command is out of date. SPARK_HADOOP_VERSION et al are deprecated. You should build with Maven, but SBT should work too. [~gq]'s command looks correct. See http://spark.apache.org/docs/latest/building-with-maven.html which documents this. > Compilation failed upon the hadoop version 2.0.0-cdh4.5.0 > - > > Key: SPARK-2815 > URL: https://issues.apache.org/jira/browse/SPARK-2815 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.1.0 >Reporter: pengyanhong >Assignee: Guoqiang Li >Priority: Blocker > > compile fail via SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true > SPARK_HIVE=true sbt/sbt assembly, finally get error message : [error] > (yarn-stable/compile:compile) Compilation failed, the following is the detail > error on console: > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:26: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.YarnClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:40: > not found: value YarnClient > [error] val yarnClient = YarnClient.createYarnClient > [error]^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:32: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:33: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:36: > object util is not a member of package org.apache.hadoop.yarn.webapp > [error] import org.apache.hadoop.yarn.webapp.util.WebAppUtils > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:64: > value RM_AM_MAX_ATTEMPTS is not a member of object > org.apache.hadoop.yarn.conf.YarnConfiguration > [error] YarnConfiguration.RM_AM_MAX_ATTEMPTS, > YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS) > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:66: > not found: type AMRMClient > [error] private var amClient: AMRMClient[ContainerRequest] = _ > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:92: > not found: value AMRMClient > [error] amClient = AMRMClient.createAMRMClient() > [error]^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:137: > not found: value WebAppUtils > [error] val proxy = WebAppUtils.getProxyHostAndPort(conf) > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:40: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:618: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:596: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:577: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:410: > value CONTAINER_ID is not a member of ob
[jira] [Comment Edited] (SPARK-2815) Compilation failed upon the hadoop version 2.0.0-cdh4.5.0
[ https://issues.apache.org/jira/browse/SPARK-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084006#comment-14084006 ] Guoqiang Li edited comment on SPARK-2815 at 8/3/14 3:10 PM: [~pengyanhong] You can try this {{./sbt/sbt clean assembly -Pyarn-alpha -Phive -Dhadoop.version=2.0.0-cdh4.5.0}} was (Author: gq): [~pengyanhong] You can try this first {{./sbt/sbt clean assembly -Pyarn-alpha -Phive -Dhadoop.version=2.0.0-cdh4.5.0}} > Compilation failed upon the hadoop version 2.0.0-cdh4.5.0 > - > > Key: SPARK-2815 > URL: https://issues.apache.org/jira/browse/SPARK-2815 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.1.0 >Reporter: pengyanhong >Assignee: Guoqiang Li >Priority: Blocker > > compile fail via SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true > SPARK_HIVE=true sbt/sbt assembly, finally get error message : [error] > (yarn-stable/compile:compile) Compilation failed, the following is the detail > error on console: > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:26: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.YarnClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:40: > not found: value YarnClient > [error] val yarnClient = YarnClient.createYarnClient > [error]^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:32: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:33: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:36: > object util is not a member of package org.apache.hadoop.yarn.webapp > [error] import org.apache.hadoop.yarn.webapp.util.WebAppUtils > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:64: > value RM_AM_MAX_ATTEMPTS is not a member of object > org.apache.hadoop.yarn.conf.YarnConfiguration > [error] YarnConfiguration.RM_AM_MAX_ATTEMPTS, > YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS) > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:66: > not found: type AMRMClient > [error] private var amClient: AMRMClient[ContainerRequest] = _ > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:92: > not found: value AMRMClient > [error] amClient = AMRMClient.createAMRMClient() > [error]^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:137: > not found: value WebAppUtils > [error] val proxy = WebAppUtils.getProxyHostAndPort(conf) > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:40: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:618: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:596: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:577: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.sc
[jira] [Commented] (SPARK-2815) Compilation failed upon the hadoop version 2.0.0-cdh4.5.0
[ https://issues.apache.org/jira/browse/SPARK-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084006#comment-14084006 ] Guoqiang Li commented on SPARK-2815: [~pengyanhong] You can try this first {{./sbt/sbt clean assembly -Pyarn-alpha -Phive -Dhadoop.version=2.0.0-cdh4.5.0}} > Compilation failed upon the hadoop version 2.0.0-cdh4.5.0 > - > > Key: SPARK-2815 > URL: https://issues.apache.org/jira/browse/SPARK-2815 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.1.0 >Reporter: pengyanhong >Assignee: Guoqiang Li >Priority: Blocker > > compile fail via SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true > SPARK_HIVE=true sbt/sbt assembly, finally get error message : [error] > (yarn-stable/compile:compile) Compilation failed, the following is the detail > error on console: > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:26: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.YarnClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:40: > not found: value YarnClient > [error] val yarnClient = YarnClient.createYarnClient > [error]^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:32: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:33: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:36: > object util is not a member of package org.apache.hadoop.yarn.webapp > [error] import org.apache.hadoop.yarn.webapp.util.WebAppUtils > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:64: > value RM_AM_MAX_ATTEMPTS is not a member of object > org.apache.hadoop.yarn.conf.YarnConfiguration > [error] YarnConfiguration.RM_AM_MAX_ATTEMPTS, > YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS) > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:66: > not found: type AMRMClient > [error] private var amClient: AMRMClient[ContainerRequest] = _ > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:92: > not found: value AMRMClient > [error] amClient = AMRMClient.createAMRMClient() > [error]^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:137: > not found: value WebAppUtils > [error] val proxy = WebAppUtils.getProxyHostAndPort(conf) > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:40: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:618: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:596: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:577: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:410: > value CONTAINER_ID is not a member of object > org.apache.hadoop.yarn.api.ApplicationConstants.Environment > [error] val containerIdString = > System.getenv(Applica
[jira] [Commented] (SPARK-2815) Compilation failed upon the hadoop version 2.0.0-cdh4.5.0
[ https://issues.apache.org/jira/browse/SPARK-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084000#comment-14084000 ] Apache Spark commented on SPARK-2815: - User 'witgo' has created a pull request for this issue: https://github.com/apache/spark/pull/1754 > Compilation failed upon the hadoop version 2.0.0-cdh4.5.0 > - > > Key: SPARK-2815 > URL: https://issues.apache.org/jira/browse/SPARK-2815 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.1.0 >Reporter: pengyanhong >Assignee: Guoqiang Li >Priority: Blocker > > compile fail via SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true > SPARK_HIVE=true sbt/sbt assembly, finally get error message : [error] > (yarn-stable/compile:compile) Compilation failed, the following is the detail > error on console: > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:26: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.YarnClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:40: > not found: value YarnClient > [error] val yarnClient = YarnClient.createYarnClient > [error]^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:32: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:33: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:36: > object util is not a member of package org.apache.hadoop.yarn.webapp > [error] import org.apache.hadoop.yarn.webapp.util.WebAppUtils > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:64: > value RM_AM_MAX_ATTEMPTS is not a member of object > org.apache.hadoop.yarn.conf.YarnConfiguration > [error] YarnConfiguration.RM_AM_MAX_ATTEMPTS, > YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS) > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:66: > not found: type AMRMClient > [error] private var amClient: AMRMClient[ContainerRequest] = _ > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:92: > not found: value AMRMClient > [error] amClient = AMRMClient.createAMRMClient() > [error]^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:137: > not found: value WebAppUtils > [error] val proxy = WebAppUtils.getProxyHostAndPort(conf) > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:40: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:618: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:596: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:577: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:410: > value CONTAINER_ID is not a member of object > org.apache.hadoop.yarn.api.ApplicationConstants.Environment > [error] val containerIdString = > System.getenv(ApplicationConstants.Environ
[jira] [Commented] (SPARK-2815) Compilation failed upon the hadoop version 2.0.0-cdh4.5.0
[ https://issues.apache.org/jira/browse/SPARK-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083998#comment-14083998 ] Guoqiang Li commented on SPARK-2815: I also encountered this bug. PRed: https://github.com/apache/spark/pull/1754 > Compilation failed upon the hadoop version 2.0.0-cdh4.5.0 > - > > Key: SPARK-2815 > URL: https://issues.apache.org/jira/browse/SPARK-2815 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.1.0 >Reporter: pengyanhong >Assignee: Guoqiang Li >Priority: Blocker > > compile fail via SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true > SPARK_HIVE=true sbt/sbt assembly, finally get error message : [error] > (yarn-stable/compile:compile) Compilation failed, the following is the detail > error on console: > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:26: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.YarnClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:40: > not found: value YarnClient > [error] val yarnClient = YarnClient.createYarnClient > [error]^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:32: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:33: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:36: > object util is not a member of package org.apache.hadoop.yarn.webapp > [error] import org.apache.hadoop.yarn.webapp.util.WebAppUtils > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:64: > value RM_AM_MAX_ATTEMPTS is not a member of object > org.apache.hadoop.yarn.conf.YarnConfiguration > [error] YarnConfiguration.RM_AM_MAX_ATTEMPTS, > YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS) > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:66: > not found: type AMRMClient > [error] private var amClient: AMRMClient[ContainerRequest] = _ > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:92: > not found: value AMRMClient > [error] amClient = AMRMClient.createAMRMClient() > [error]^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:137: > not found: value WebAppUtils > [error] val proxy = WebAppUtils.getProxyHostAndPort(conf) > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:40: > object api is not a member of package org.apache.hadoop.yarn.client > [error] import org.apache.hadoop.yarn.client.api.AMRMClient > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:618: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:596: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:577: > not found: type AMRMClient > [error] amClient: AMRMClient[ContainerRequest], > [error] ^ > [error] > /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:410: > value CONTAINER_ID is not a member of object > org.apache.hadoop.yarn.api.ApplicationConstants.Environment > [error] val containerIdString = > System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name
[jira] [Created] (SPARK-2815) Compilation failed upon the hadoop version 2.0.0-cdh4.5.0
pengyanhong created SPARK-2815: -- Summary: Compilation failed upon the hadoop version 2.0.0-cdh4.5.0 Key: SPARK-2815 URL: https://issues.apache.org/jira/browse/SPARK-2815 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.1.0 Reporter: pengyanhong Priority: Blocker compile fail via SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true SPARK_HIVE=true sbt/sbt assembly, finally get error message : [error] (yarn-stable/compile:compile) Compilation failed, the following is the detail error on console: [error] /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:26: object api is not a member of package org.apache.hadoop.yarn.client [error] import org.apache.hadoop.yarn.client.api.YarnClient [error] ^ [error] /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:40: not found: value YarnClient [error] val yarnClient = YarnClient.createYarnClient [error]^ [error] /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:32: object api is not a member of package org.apache.hadoop.yarn.client [error] import org.apache.hadoop.yarn.client.api.AMRMClient [error] ^ [error] /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:33: object api is not a member of package org.apache.hadoop.yarn.client [error] import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest [error] ^ [error] /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:36: object util is not a member of package org.apache.hadoop.yarn.webapp [error] import org.apache.hadoop.yarn.webapp.util.WebAppUtils [error] ^ [error] /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:64: value RM_AM_MAX_ATTEMPTS is not a member of object org.apache.hadoop.yarn.conf.YarnConfiguration [error] YarnConfiguration.RM_AM_MAX_ATTEMPTS, YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS) [error] ^ [error] /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:66: not found: type AMRMClient [error] private var amClient: AMRMClient[ContainerRequest] = _ [error] ^ [error] /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:92: not found: value AMRMClient [error] amClient = AMRMClient.createAMRMClient() [error]^ [error] /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:137: not found: value WebAppUtils [error] val proxy = WebAppUtils.getProxyHostAndPort(conf) [error] ^ [error] /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:40: object api is not a member of package org.apache.hadoop.yarn.client [error] import org.apache.hadoop.yarn.client.api.AMRMClient [error] ^ [error] /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:618: not found: type AMRMClient [error] amClient: AMRMClient[ContainerRequest], [error] ^ [error] /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:596: not found: type AMRMClient [error] amClient: AMRMClient[ContainerRequest], [error] ^ [error] /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:577: not found: type AMRMClient [error] amClient: AMRMClient[ContainerRequest], [error] ^ [error] /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala:410: value CONTAINER_ID is not a member of object org.apache.hadoop.yarn.api.ApplicationConstants.Environment [error] val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name()) [error] ^ [error] /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:128: value setTokens is not a member of org.apache.hadoop.yarn.api.records.ContainerLaunchContext [error] amContainer.setTokens(ByteBuffer.wrap(dob.getData())) [error] ^ [error] /Users/pengyanhong/git/spark/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ExecutorLauncher.
[jira] [Updated] (SPARK-2814) HiveThriftServer throws NPE when executing native commands
[ https://issues.apache.org/jira/browse/SPARK-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-2814: -- Description: After [PR #1686|https://github.com/apache/spark/pull/1686], {{HiveThriftServer2}} throws exception when executing native commands. The reason is that initialization of {{HiveContext.sessionState.out}} and {{HiveContext.sessionState.err}} were made lazy, while {{HiveThriftServer2}} uses an overriden version of {{HiveContext}} that doesn't know how to initialize these two streams. When {{HiveContext.runHive}} tries to write to {{HiveContext.sessionState.out}}, an NPE is throw. Reproduction steps: # Start HiveThriftServer2 # Connect to it via beeline # Execute `set;` Exception thrown: {code} == HIVE FAILURE OUTPUT == == END HIVE FAILURE OUTPUT == 14/08/03 21:30:55 ERROR SparkSQLOperationManager: Error executing query: java.lang.NullPointerException at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:210) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:173) at org.apache.spark.sql.hive.HiveContext.set(HiveContext.scala:144) at org.apache.spark.sql.execution.SetCommand.sideEffectResult$lzycompute(commands.scala:59) at org.apache.spark.sql.execution.SetCommand.sideEffectResult(commands.scala:50) ... {code} was: After [PR #1686|https://github.com/apache/spark/pull/1686], {{HiveThriftServer2}} throws exception when executing native commands. The reason is that initialization of {{HiveContext.sessionState.out}} and {{HiveContext.sessionState.err}} were made lazy, while {{HiveThriftServer2}} uses an overriden version of {{HiveContext}} that doesn't know how to initialize these two streams. Reproduction steps: # Start HiveThriftServer2 # Connect to it via beeline # Execute `set;` Exception thrown: {code} == HIVE FAILURE OUTPUT == == END HIVE FAILURE OUTPUT == 14/08/03 21:30:55 ERROR SparkSQLOperationManager: Error executing query: java.lang.NullPointerException at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:210) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:173) at org.apache.spark.sql.hive.HiveContext.set(HiveContext.scala:144) at org.apache.spark.sql.execution.SetCommand.sideEffectResult$lzycompute(commands.scala:59) at org.apache.spark.sql.execution.SetCommand.sideEffectResult(commands.scala:50) ... {code} > HiveThriftServer throws NPE when executing native commands > -- > > Key: SPARK-2814 > URL: https://issues.apache.org/jira/browse/SPARK-2814 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Cheng Lian > > After [PR #1686|https://github.com/apache/spark/pull/1686], > {{HiveThriftServer2}} throws exception when executing native commands. > The reason is that initialization of {{HiveContext.sessionState.out}} and > {{HiveContext.sessionState.err}} were made lazy, while {{HiveThriftServer2}} > uses an overriden version of {{HiveContext}} that doesn't know how to > initialize these two streams. When {{HiveContext.runHive}} tries to write to > {{HiveContext.sessionState.out}}, an NPE is throw. > Reproduction steps: > # Start HiveThriftServer2 > # Connect to it via beeline > # Execute `set;` > Exception thrown: > {code} > == > HIVE FAILURE OUTPUT > == > == > END HIVE FAILURE OUTPUT > == > 14/08/03 21:30:55 ERROR SparkSQLOperationManager: Error executing query: > java.lang.NullPointerException > at > org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:210) > at > org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:173) > at org.apache.spark.sql.hive.HiveContext.set(HiveContext.scala:144) > at > org.apache.spark.sql.execution.SetCommand.sideEffectResult$lzycompute(commands.scala:59) > at > org.apache.spark.sql.execution.SetCommand.sideEffectResult(commands.scala:50) > ... > {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2814) HiveThriftServer throws NPE when executing native commands
[ https://issues.apache.org/jira/browse/SPARK-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083974#comment-14083974 ] Apache Spark commented on SPARK-2814: - User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/1753 > HiveThriftServer throws NPE when executing native commands > -- > > Key: SPARK-2814 > URL: https://issues.apache.org/jira/browse/SPARK-2814 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Cheng Lian > > After [PR #1686|https://github.com/apache/spark/pull/1686], > {{HiveThriftServer2}} throws exception when executing native commands. > The reason is that initialization of {{HiveContext.sessionState.out}} and > {{HiveContext.sessionState.err}} were made lazy, while {{HiveThriftServer2}} > uses an overriden version of {{HiveContext}} that doesn't know how to > initialize these two streams. > Reproduction steps: > # Start HiveThriftServer2 > # Connect to it via beeline > # Execute `set;` > Exception thrown: > {code} > == > HIVE FAILURE OUTPUT > == > == > END HIVE FAILURE OUTPUT > == > 14/08/03 21:30:55 ERROR SparkSQLOperationManager: Error executing query: > java.lang.NullPointerException > at > org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:210) > at > org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:173) > at org.apache.spark.sql.hive.HiveContext.set(HiveContext.scala:144) > at > org.apache.spark.sql.execution.SetCommand.sideEffectResult$lzycompute(commands.scala:59) > at > org.apache.spark.sql.execution.SetCommand.sideEffectResult(commands.scala:50) > ... > {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-2814) HiveThriftServer throws NPE when executing native commands
Cheng Lian created SPARK-2814: - Summary: HiveThriftServer throws NPE when executing native commands Key: SPARK-2814 URL: https://issues.apache.org/jira/browse/SPARK-2814 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: Cheng Lian After [PR #1686|https://github.com/apache/spark/pull/1686], {{HiveThriftServer2}} throws exception when executing native commands. The reason is that initialization of {{HiveContext.sessionState.out}} and {{HiveContext.sessionState.err}} were made lazy, while {{HiveThriftServer2}} uses an overriden version of {{HiveContext}} that doesn't know how to initialize these two streams. Reproduction steps: # Start HiveThriftServer2 # Connect to it via beeline # Execute `set;` Exception thrown: {code} == HIVE FAILURE OUTPUT == == END HIVE FAILURE OUTPUT == 14/08/03 21:30:55 ERROR SparkSQLOperationManager: Error executing query: java.lang.NullPointerException at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:210) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:173) at org.apache.spark.sql.hive.HiveContext.set(HiveContext.scala:144) at org.apache.spark.sql.execution.SetCommand.sideEffectResult$lzycompute(commands.scala:59) at org.apache.spark.sql.execution.SetCommand.sideEffectResult(commands.scala:50) ... {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2803) add Kafka stream feature for fetch messages from specified starting offset position
[ https://issues.apache.org/jira/browse/SPARK-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083973#comment-14083973 ] pengyanhong commented on SPARK-2803: resolved this issue in the pull request #1602 > add Kafka stream feature for fetch messages from specified starting offset > position > --- > > Key: SPARK-2803 > URL: https://issues.apache.org/jira/browse/SPARK-2803 > Project: Spark > Issue Type: New Feature > Components: Input/Output >Reporter: pengyanhong > Labels: patch > > There are some use cases that we want to fetch message from specified offset > position, as below: > * replay messages > * deal with transaction > * skip bulk incorrect messages > * random fetch message according to index -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1449) Please delete old releases from mirroring system
[ https://issues.apache.org/jira/browse/SPARK-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083942#comment-14083942 ] Sebb commented on SPARK-1449: - No need to check out the directory tree (which is large), you can remove files directly from SVN using "svn delete (del, remove, rm)" By default all members of the Spark PMC [1] will have karma to update the dist/release/spark tree. In particular whoever uploaded the last release should have ensured that previous releases were tidied up a few days after uploading the latest release ... The PMC can vote to ask Infra if they wish the dist/release/spark tree to be updateable by non-PMC members as well. [1] http://people.apache.org/committers-by-project.html#spark-pmc > Please delete old releases from mirroring system > > > Key: SPARK-1449 > URL: https://issues.apache.org/jira/browse/SPARK-1449 > Project: Spark > Issue Type: Task >Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.9.1 >Reporter: Sebb > > To reduce the load on the ASF mirrors, projects are required to delete old > releases [1] > Please can you remove all non-current releases? > Thanks! > [Note that older releases are always available from the ASF archive server] > Any links to older releases on download pages should first be adjusted to > point to the archive server. > [1] http://www.apache.org/dev/release.html#when-to-archive -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1449) Please delete old releases from mirroring system
[ https://issues.apache.org/jira/browse/SPARK-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083933#comment-14083933 ] Sean Owen commented on SPARK-1449: -- Sebb, is this just a matter of "svn co https://dist.apache.org/repos/dist/release/spark/"; and svn rm'ing the 0.9.1 and 1.0.0 releases? I'd do it but I don't have access. I think. [~pwendell] maybe this can be a step in the release process if not already? It may well be and these older ones were just missed last time. > Please delete old releases from mirroring system > > > Key: SPARK-1449 > URL: https://issues.apache.org/jira/browse/SPARK-1449 > Project: Spark > Issue Type: Task >Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.9.1 >Reporter: Sebb > > To reduce the load on the ASF mirrors, projects are required to delete old > releases [1] > Please can you remove all non-current releases? > Thanks! > [Note that older releases are always available from the ASF archive server] > Any links to older releases on download pages should first be adjusted to > point to the archive server. > [1] http://www.apache.org/dev/release.html#when-to-archive -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1997) Update breeze to version 0.8.1
[ https://issues.apache.org/jira/browse/SPARK-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083929#comment-14083929 ] Sean Owen commented on SPARK-1997: -- Was scalalogging a problem per se? the issue was that Spark used a different verison, but now it doesn't use it at all, and there is no conflict. Unless I misunderstand, it would be fine to use breeze 0.8.1 + Scala 2.10 in the current Spark code. > Update breeze to version 0.8.1 > -- > > Key: SPARK-1997 > URL: https://issues.apache.org/jira/browse/SPARK-1997 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Guoqiang Li >Assignee: Guoqiang Li > Fix For: 1.1.0 > > > {{breeze 0.7}} does not support {{scala 2.11}} . -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1022) Add unit tests for kafka streaming
[ https://issues.apache.org/jira/browse/SPARK-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083928#comment-14083928 ] Apache Spark commented on SPARK-1022: - User 'jerryshao' has created a pull request for this issue: https://github.com/apache/spark/pull/1751 > Add unit tests for kafka streaming > -- > > Key: SPARK-1022 > URL: https://issues.apache.org/jira/browse/SPARK-1022 > Project: Spark > Issue Type: Bug >Reporter: Patrick Wendell >Assignee: Saisai Shao > > It would be nice if we could add unit tests to verify elements of kafka's > stream. Right now we do integration tests only which makes it hard to upgrade > versions of kafka. The place to start here would be to look at how kafka > tests itself and see if the functionality can be exposed to third party users. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org