[jira] [Commented] (SPARK-3033) [Hive] java.math.BigDecimal cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal
[ https://issues.apache.org/jira/browse/SPARK-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106557#comment-14106557 ] pengyanhong commented on SPARK-3033: I changed the file {quote} sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala {quote} changed the method eval in the class HiveGenericUdf as below: {quote} while (i children.length) { val idx = i deferedObjects(i).asInstanceOf[DeferredObjectAdapter].set(() = { children(idx).eval(input) }) if (deferedObjects(i).get().isInstanceOf[java.math.BigDecimal] == true) { val decimal = deferedObjects(i).get().asInstanceOf[java.math.BigDecimal] val data = new org.apache.hadoop.hive.common.`type`.HiveDecimal(decimal).asInstanceOf[EvaluatedType] deferedObjects(i).asInstanceOf[DeferredObjectAdapter].set(() = { data.asInstanceOf[EvaluatedType] }) } i += 1 } {quote} also, changed the method wrap in the trait HiveInspectors, add line:{quote} case b: org.apache.hadoop.hive.common.`type`.HiveDecimal = b {quote} So this issue has been fixed. [Hive] java.math.BigDecimal cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal Key: SPARK-3033 URL: https://issues.apache.org/jira/browse/SPARK-3033 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 1.0.2 Reporter: pengyanhong Priority: Blocker run a complex HiveQL via yarn-cluster, got error as below: {quote} 14/08/14 15:05:24 WARN org.apache.spark.Logging$class.logWarning(Logging.scala:70): Loss was due to java.lang.ClassCastException java.lang.ClassCastException: java.math.BigDecimal cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaHiveDecimalObjectInspector.getPrimitiveJavaObject(JavaHiveDecimalObjectInspector.java:51) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getHiveDecimal(PrimitiveObjectInspectorUtils.java:1022) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$HiveDecimalConverter.convert(PrimitiveObjectInspectorConverter.java:306) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ReturnObjectInspectorResolver.convertIfNecessary(GenericUDFUtils.java:179) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFIf.evaluate(GenericUDFIf.java:82) at org.apache.spark.sql.hive.HiveGenericUdf.eval(hiveUdfs.scala:276) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:84) at org.apache.spark.sql.catalyst.expressions.MutableProjection.apply(Projection.scala:62) at org.apache.spark.sql.catalyst.expressions.MutableProjection.apply(Projection.scala:51) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.sql.execution.BroadcastNestedLoopJoin$$anonfun$4.apply(joins.scala:309) at org.apache.spark.sql.execution.BroadcastNestedLoopJoin$$anonfun$4.apply(joins.scala:303) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3033) [Hive] java.math.BigDecimal cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal
[ https://issues.apache.org/jira/browse/SPARK-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106557#comment-14106557 ] pengyanhong edited comment on SPARK-3033 at 8/22/14 6:54 AM: - I changed the file {quote} sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala {quote} changed the method eval in the class HiveGenericUdf as below: {code} while (i children.length) { val idx = i deferedObjects(i).asInstanceOf[DeferredObjectAdapter].set(() = { children(idx).eval(input) }) if (deferedObjects(i).get().isInstanceOf[java.math.BigDecimal] == true) { val decimal = deferedObjects(i).get().asInstanceOf[java.math.BigDecimal] val data = new org.apache.hadoop.hive.common.`type`.HiveDecimal(decimal).asInstanceOf[EvaluatedType] deferedObjects(i).asInstanceOf[DeferredObjectAdapter].set(() = { data.asInstanceOf[EvaluatedType] }) } i += 1 } {code} also, changed the method wrap in the trait HiveInspectors, add line:{code} case b: org.apache.hadoop.hive.common.`type`.HiveDecimal = b {code} So this issue has been fixed. was (Author: pengyanhong): I changed the file {quote} sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala {quote} changed the method eval in the class HiveGenericUdf as below: {quote} while (i children.length) { val idx = i deferedObjects(i).asInstanceOf[DeferredObjectAdapter].set(() = { children(idx).eval(input) }) if (deferedObjects(i).get().isInstanceOf[java.math.BigDecimal] == true) { val decimal = deferedObjects(i).get().asInstanceOf[java.math.BigDecimal] val data = new org.apache.hadoop.hive.common.`type`.HiveDecimal(decimal).asInstanceOf[EvaluatedType] deferedObjects(i).asInstanceOf[DeferredObjectAdapter].set(() = { data.asInstanceOf[EvaluatedType] }) } i += 1 } {quote} also, changed the method wrap in the trait HiveInspectors, add line:{quote} case b: org.apache.hadoop.hive.common.`type`.HiveDecimal = b {quote} So this issue has been fixed. [Hive] java.math.BigDecimal cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal Key: SPARK-3033 URL: https://issues.apache.org/jira/browse/SPARK-3033 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 1.0.2 Reporter: pengyanhong Priority: Blocker run a complex HiveQL via yarn-cluster, got error as below: {quote} 14/08/14 15:05:24 WARN org.apache.spark.Logging$class.logWarning(Logging.scala:70): Loss was due to java.lang.ClassCastException java.lang.ClassCastException: java.math.BigDecimal cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaHiveDecimalObjectInspector.getPrimitiveJavaObject(JavaHiveDecimalObjectInspector.java:51) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getHiveDecimal(PrimitiveObjectInspectorUtils.java:1022) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$HiveDecimalConverter.convert(PrimitiveObjectInspectorConverter.java:306) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ReturnObjectInspectorResolver.convertIfNecessary(GenericUDFUtils.java:179) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFIf.evaluate(GenericUDFIf.java:82) at org.apache.spark.sql.hive.HiveGenericUdf.eval(hiveUdfs.scala:276) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:84) at org.apache.spark.sql.catalyst.expressions.MutableProjection.apply(Projection.scala:62) at org.apache.spark.sql.catalyst.expressions.MutableProjection.apply(Projection.scala:51) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.sql.execution.BroadcastNestedLoopJoin$$anonfun$4.apply(joins.scala:309) at org.apache.spark.sql.execution.BroadcastNestedLoopJoin$$anonfun$4.apply(joins.scala:303) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at
[jira] [Comment Edited] (SPARK-3033) [Hive] java.math.BigDecimal cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal
[ https://issues.apache.org/jira/browse/SPARK-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106557#comment-14106557 ] pengyanhong edited comment on SPARK-3033 at 8/22/14 6:58 AM: - I changed the file {quote} sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala {quote} changed the method eval in the class HiveGenericUdf as below: {code} while (i children.length) { val idx = i deferedObjects(i).asInstanceOf[DeferredObjectAdapter].set(() = { children(idx).eval(input) }) if (deferedObjects(i).get().isInstanceOf[java.math.BigDecimal] == true) { val decimal = deferedObjects(i).get().asInstanceOf[java.math.BigDecimal] deferedObjects(i).asInstanceOf[DeferredObjectAdapter].set(() = { new org.apache.hadoop.hive.common.`type`.HiveDecimal(decimal).asInstanceOf[EvaluatedType] }) } i += 1 } {code} also, changed the method wrap in the trait HiveInspectors, add line:{code} case b: org.apache.hadoop.hive.common.`type`.HiveDecimal = b {code} So this issue has been fixed. was (Author: pengyanhong): I changed the file {quote} sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala {quote} changed the method eval in the class HiveGenericUdf as below: {code} while (i children.length) { val idx = i deferedObjects(i).asInstanceOf[DeferredObjectAdapter].set(() = { children(idx).eval(input) }) if (deferedObjects(i).get().isInstanceOf[java.math.BigDecimal] == true) { val decimal = deferedObjects(i).get().asInstanceOf[java.math.BigDecimal] val data = new org.apache.hadoop.hive.common.`type`.HiveDecimal(decimal).asInstanceOf[EvaluatedType] deferedObjects(i).asInstanceOf[DeferredObjectAdapter].set(() = { data.asInstanceOf[EvaluatedType] }) } i += 1 } {code} also, changed the method wrap in the trait HiveInspectors, add line:{code} case b: org.apache.hadoop.hive.common.`type`.HiveDecimal = b {code} So this issue has been fixed. [Hive] java.math.BigDecimal cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal Key: SPARK-3033 URL: https://issues.apache.org/jira/browse/SPARK-3033 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 1.0.2 Reporter: pengyanhong Priority: Blocker run a complex HiveQL via yarn-cluster, got error as below: {quote} 14/08/14 15:05:24 WARN org.apache.spark.Logging$class.logWarning(Logging.scala:70): Loss was due to java.lang.ClassCastException java.lang.ClassCastException: java.math.BigDecimal cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaHiveDecimalObjectInspector.getPrimitiveJavaObject(JavaHiveDecimalObjectInspector.java:51) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getHiveDecimal(PrimitiveObjectInspectorUtils.java:1022) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$HiveDecimalConverter.convert(PrimitiveObjectInspectorConverter.java:306) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ReturnObjectInspectorResolver.convertIfNecessary(GenericUDFUtils.java:179) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFIf.evaluate(GenericUDFIf.java:82) at org.apache.spark.sql.hive.HiveGenericUdf.eval(hiveUdfs.scala:276) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:84) at org.apache.spark.sql.catalyst.expressions.MutableProjection.apply(Projection.scala:62) at org.apache.spark.sql.catalyst.expressions.MutableProjection.apply(Projection.scala:51) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.sql.execution.BroadcastNestedLoopJoin$$anonfun$4.apply(joins.scala:309) at org.apache.spark.sql.execution.BroadcastNestedLoopJoin$$anonfun$4.apply(joins.scala:303) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at
[jira] [Created] (SPARK-3181) Add Robust Regression Algorithm with Huber Estimator
Fan Jiang created SPARK-3181: Summary: Add Robust Regression Algorithm with Huber Estimator Key: SPARK-3181 URL: https://issues.apache.org/jira/browse/SPARK-3181 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.0.2 Reporter: Fan Jiang Priority: Critical Fix For: 1.1.1, 1.2.0 Linear least square estimates assume the error has normal distribution and can behave badly when the errors are heavy-tailed. In practical we get various types of data. We need to include Robust Regression to employ a fitting criterion that is not as vulnerable as least square. In 1973, Huber introduced M-estimation for regression which stands for maximum likelihood type. The method is resistant to outliers in the response variable and has been widely used. The new feature for MLlib will contain 3 new files /main/scala/org/apache/spark/mllib/regression/RobustRegression.scala /test/scala/org/apache/spark/mllib/regression/RobustRegressionSuite.scala /main/scala/org/apache/spark/examples/mllib/HuberRobustRegression.scala and one new class HuberRobustGradient in /main/scala/org/apache/spark/mllib/optimization/Gradient.scala -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3181) Add Robust Regression Algorithm with Huber Estimator
[ https://issues.apache.org/jira/browse/SPARK-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106679#comment-14106679 ] Apache Spark commented on SPARK-3181: - User 'fjiang6' has created a pull request for this issue: https://github.com/apache/spark/pull/2096 Add Robust Regression Algorithm with Huber Estimator Key: SPARK-3181 URL: https://issues.apache.org/jira/browse/SPARK-3181 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.0.2 Reporter: Fan Jiang Priority: Critical Labels: features Fix For: 1.1.1, 1.2.0 Original Estimate: 0h Remaining Estimate: 0h Linear least square estimates assume the error has normal distribution and can behave badly when the errors are heavy-tailed. In practical we get various types of data. We need to include Robust Regression to employ a fitting criterion that is not as vulnerable as least square. In 1973, Huber introduced M-estimation for regression which stands for maximum likelihood type. The method is resistant to outliers in the response variable and has been widely used. The new feature for MLlib will contain 3 new files /main/scala/org/apache/spark/mllib/regression/RobustRegression.scala /test/scala/org/apache/spark/mllib/regression/RobustRegressionSuite.scala /main/scala/org/apache/spark/examples/mllib/HuberRobustRegression.scala and one new class HuberRobustGradient in /main/scala/org/apache/spark/mllib/optimization/Gradient.scala -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3182) Twitter Streaming Geoloaction Filter
Daniel Kershaw created SPARK-3182: - Summary: Twitter Streaming Geoloaction Filter Key: SPARK-3182 URL: https://issues.apache.org/jira/browse/SPARK-3182 Project: Spark Issue Type: Wish Components: Streaming Affects Versions: 1.0.2, 1.0.0 Reporter: Daniel Kershaw Fix For: 1.2.0 Add a geolocation filter to the Twitter Streaming Component. This should take a sequence of double to indicate the bounding box for the stream. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2360) CSV import to SchemaRDDs
[ https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106851#comment-14106851 ] Hingorani, Vineet commented on SPARK-2360: -- Hello Michael, I saw your comment thread on a mail archive regarding having to be able to manipulate csv files using spark. Could you please give some information as to do have this functionality now in the latest release of Spark? I have installed the lates version as of now and running it on my local machine. Thank you Regards, Vineet Hingorani Developer Associate Custom Development Strategic Projects group (CDSP) Products Innovation (PI) SAP SE WDF 03, C3.03 E vineet.hingor...@sap.commailto:vineet.hingor...@sap.com CSV import to SchemaRDDs Key: SPARK-2360 URL: https://issues.apache.org/jira/browse/SPARK-2360 Project: Spark Issue Type: New Feature Components: SQL Reporter: Michael Armbrust Assignee: Hossein Falaki I think the first step it to design the interface that we want to present to users. Mostly this is defining options when importing. Off the top of my head: - What is the separator? - Provide column names or infer them from the first row. - how to handle multiple files with possibly different schemas - do we have a method to let users specify the datatypes of the columns or are they just strings? - what types of quoting / escaping do we want to support? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2742) The variable inputFormatInfo and inputFormatMap never used
[ https://issues.apache.org/jira/browse/SPARK-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-2742. -- Resolution: Fixed Fix Version/s: 1.2.0 The variable inputFormatInfo and inputFormatMap never used -- Key: SPARK-2742 URL: https://issues.apache.org/jira/browse/SPARK-2742 Project: Spark Issue Type: Bug Components: YARN Reporter: meiyoula Priority: Minor Fix For: 1.2.0 the ClientArguments class has two never used variables, one is inputFormatInfo, the other is inputFormatMap -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2742) The variable inputFormatInfo and inputFormatMap never used
[ https://issues.apache.org/jira/browse/SPARK-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106997#comment-14106997 ] Thomas Graves commented on SPARK-2742: -- https://github.com/apache/spark/pull/1614 The variable inputFormatInfo and inputFormatMap never used -- Key: SPARK-2742 URL: https://issues.apache.org/jira/browse/SPARK-2742 Project: Spark Issue Type: Bug Components: YARN Reporter: meiyoula Priority: Minor Fix For: 1.2.0 the ClientArguments class has two never used variables, one is inputFormatInfo, the other is inputFormatMap -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3183) Add option for requesting full YARN cluster
Sandy Ryza created SPARK-3183: - Summary: Add option for requesting full YARN cluster Key: SPARK-3183 URL: https://issues.apache.org/jira/browse/SPARK-3183 Project: Spark Issue Type: Improvement Components: YARN Reporter: Sandy Ryza This could possibly be in the form of --executor-cores ALL --executor-memory ALL --num-executors ALL. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3184) Allow user to specify num tasks to use for a table
Andy Konwinski created SPARK-3184: - Summary: Allow user to specify num tasks to use for a table Key: SPARK-3184 URL: https://issues.apache.org/jira/browse/SPARK-3184 Project: Spark Issue Type: Improvement Components: SQL Reporter: Andy Konwinski -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1981) Add AWS Kinesis streaming support
[ https://issues.apache.org/jira/browse/SPARK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107131#comment-14107131 ] Jonathan Kelly commented on SPARK-1981: --- The code here is a cleaned up version of the code from that article that Chris took over from Parviz and integrated into Spark itself, and it will be available whenever Spark 1.1 is released. Add AWS Kinesis streaming support - Key: SPARK-1981 URL: https://issues.apache.org/jira/browse/SPARK-1981 Project: Spark Issue Type: New Feature Components: Streaming Reporter: Chris Fregly Assignee: Chris Fregly Fix For: 1.1.0 Add AWS Kinesis support to Spark Streaming. Initial discussion occured here: https://github.com/apache/spark/pull/223 I discussed this with Parviz from AWS recently and we agreed that I would take this over. Look for a new PR that takes into account all the feedback from the earlier PR including spark-1.0-compliant implementation, AWS-license-aware build support, tests, comments, and style guide compliance. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2921) Mesos doesn't handle spark.executor.extraJavaOptions correctly (among other things)
[ https://issues.apache.org/jira/browse/SPARK-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2921: --- Priority: Blocker (was: Critical) Mesos doesn't handle spark.executor.extraJavaOptions correctly (among other things) --- Key: SPARK-2921 URL: https://issues.apache.org/jira/browse/SPARK-2921 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.2 Reporter: Andrew Or Priority: Blocker Fix For: 1.1.0 The code path to handle this exists only for the coarse grained mode, and even in this mode the java options aren't passed to the executors properly. We currently pass the entire value of spark.executor.extraJavaOptions to the executors as a string without splitting it. We need to use Utils.splitCommandString as in standalone mode. I have not confirmed this, but I would assume spark.executor.extraClassPath and spark.executor.extraLibraryPath are also not propagated correctly in either mode. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2360) CSV import to SchemaRDDs
[ https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107155#comment-14107155 ] Hossein Falaki commented on SPARK-2360: --- There is a pull request for this issue: https://github.com/apache/spark/pull/1351 It did not make it to Spark 1.1, due to last minute API changes. It will make it to the next release. The API will provide a very easy (default) way of reading common CSV (e.g., comma delimited) into SchemaRDDs. Users will be able to specify delimiter and quotation characters. CSV import to SchemaRDDs Key: SPARK-2360 URL: https://issues.apache.org/jira/browse/SPARK-2360 Project: Spark Issue Type: New Feature Components: SQL Reporter: Michael Armbrust Assignee: Hossein Falaki I think the first step it to design the interface that we want to present to users. Mostly this is defining options when importing. Off the top of my head: - What is the separator? - Provide column names or infer them from the first row. - how to handle multiple files with possibly different schemas - do we have a method to let users specify the datatypes of the columns or are they just strings? - what types of quoting / escaping do we want to support? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2360) CSV import to SchemaRDDs
[ https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107233#comment-14107233 ] Erik Erlandson commented on SPARK-2360: --- It appears that this is not a pure lazy transform, as it invokes 'first()` when inferring schema from headers. I wrote up some ideas on this, pertaining to SPARK-2315, here: http://erikerlandson.github.io/blog/2014/07/29/deferring-spark-actions-to-lazy-transforms-with-the-promise-rdd/ CSV import to SchemaRDDs Key: SPARK-2360 URL: https://issues.apache.org/jira/browse/SPARK-2360 Project: Spark Issue Type: New Feature Components: SQL Reporter: Michael Armbrust Assignee: Hossein Falaki I think the first step it to design the interface that we want to present to users. Mostly this is defining options when importing. Off the top of my head: - What is the separator? - Provide column names or infer them from the first row. - how to handle multiple files with possibly different schemas - do we have a method to let users specify the datatypes of the columns or are they just strings? - what types of quoting / escaping do we want to support? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3176) Implement 'POWER', 'ABS and 'LAST' for sql
[ https://issues.apache.org/jira/browse/SPARK-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107237#comment-14107237 ] Apache Spark commented on SPARK-3176: - User 'xinyunh' has created a pull request for this issue: https://github.com/apache/spark/pull/2099 Implement 'POWER', 'ABS and 'LAST' for sql -- Key: SPARK-3176 URL: https://issues.apache.org/jira/browse/SPARK-3176 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.0.2, 1.1.0 Environment: All Reporter: Xinyun Huang Priority: Minor Fix For: 1.2.0 Original Estimate: 3h Remaining Estimate: 3h Add support for the mathematical function POWER and ABS and the analytic function last to return a subset of the rows satisfying a query within spark sql. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3185) SPARK launch on Hadoop 2 in EC2 throws Tachyon exception when Formatting JOURNAL_FOLDER
Jeremy Chambers created SPARK-3185: -- Summary: SPARK launch on Hadoop 2 in EC2 throws Tachyon exception when Formatting JOURNAL_FOLDER Key: SPARK-3185 URL: https://issues.apache.org/jira/browse/SPARK-3185 Project: Spark Issue Type: Bug Affects Versions: 1.0.2 Environment: Amazon Linux AMI [ec2-user@ip-172-30-1-145 ~]$ uname -a Linux ip-172-30-1-145 3.10.42-52.145.amzn1.x86_64 #1 SMP Tue Jun 10 23:46:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux https://aws.amazon.com/amazon-linux-ami/2014.03-release-notes/ The build I used (and MD5 verified): [ec2-user@ip-172-30-1-145 ~]$ wget http://supergsego.com/apache/spark/spark-1.0.2/spark-1.0.2-bin-hadoop2.tgz Reporter: Jeremy Chambers org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4 When I launch SPARK 1.0.2 on Hadoop 2 in a new EC2 cluster, the above tachyon exception is thrown when Formatting JOURNAL_FOLDER. No exception occurs when I launch on Hadoop 1. Launch used: ./spark-ec2 -k spark_cluster -i /home/ec2-user/kagi/spark_cluster.ppk --zone=us-east-1a --hadoop-major-version=2 --spot-price=0.0165 -s 3 launch sparkProd log snippet Formatting Tachyon Master @ ec2-54-80-49-244.compute-1.amazonaws.com Formatting JOURNAL_FOLDER: /root/tachyon/libexec/../journal/ Exception in thread main java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4 at tachyon.util.CommonUtils.runtimeException(CommonUtils.java:246) at tachyon.UnderFileSystemHdfs.init(UnderFileSystemHdfs.java:73) at tachyon.UnderFileSystemHdfs.getClient(UnderFileSystemHdfs.java:53) at tachyon.UnderFileSystem.get(UnderFileSystem.java:53) at tachyon.Format.main(Format.java:54) Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4 at org.apache.hadoop.ipc.Client.call(Client.java:1070) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:238) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:203) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) at tachyon.UnderFileSystemHdfs.init(UnderFileSystemHdfs.java:69) ... 3 more Killed 0 processes Killed 0 processes ec2-54-167-219-159.compute-1.amazonaws.com: Killed 0 processes ec2-54-198-198-17.compute-1.amazonaws.com: Killed 0 processes ec2-54-166-36-0.compute-1.amazonaws.com: Killed 0 processes ---end snippet--- *** I don't have this problem when I launch without the --hadoop-major-version=2 (which defaults to Hadoop 1.x) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3185) SPARK launch on Hadoop 2 in EC2 throws Tachyon exception when Formatting JOURNAL_FOLDER
[ https://issues.apache.org/jira/browse/SPARK-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107300#comment-14107300 ] Jeremy Chambers commented on SPARK-3185: Cross reference: http://apache-spark-user-list.1001560.n3.nabble.com/Server-IPC-version-7-cannot-communicate-with-client-version-4-with-Spark-Streaming-1-0-0-in-Java-ande-tp9908p9914.html SPARK launch on Hadoop 2 in EC2 throws Tachyon exception when Formatting JOURNAL_FOLDER --- Key: SPARK-3185 URL: https://issues.apache.org/jira/browse/SPARK-3185 Project: Spark Issue Type: Bug Affects Versions: 1.0.2 Environment: Amazon Linux AMI [ec2-user@ip-172-30-1-145 ~]$ uname -a Linux ip-172-30-1-145 3.10.42-52.145.amzn1.x86_64 #1 SMP Tue Jun 10 23:46:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux https://aws.amazon.com/amazon-linux-ami/2014.03-release-notes/ The build I used (and MD5 verified): [ec2-user@ip-172-30-1-145 ~]$ wget http://supergsego.com/apache/spark/spark-1.0.2/spark-1.0.2-bin-hadoop2.tgz Reporter: Jeremy Chambers org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4 When I launch SPARK 1.0.2 on Hadoop 2 in a new EC2 cluster, the above tachyon exception is thrown when Formatting JOURNAL_FOLDER. No exception occurs when I launch on Hadoop 1. Launch used: ./spark-ec2 -k spark_cluster -i /home/ec2-user/kagi/spark_cluster.ppk --zone=us-east-1a --hadoop-major-version=2 --spot-price=0.0165 -s 3 launch sparkProd log snippet Formatting Tachyon Master @ ec2-54-80-49-244.compute-1.amazonaws.com Formatting JOURNAL_FOLDER: /root/tachyon/libexec/../journal/ Exception in thread main java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4 at tachyon.util.CommonUtils.runtimeException(CommonUtils.java:246) at tachyon.UnderFileSystemHdfs.init(UnderFileSystemHdfs.java:73) at tachyon.UnderFileSystemHdfs.getClient(UnderFileSystemHdfs.java:53) at tachyon.UnderFileSystem.get(UnderFileSystem.java:53) at tachyon.Format.main(Format.java:54) Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4 at org.apache.hadoop.ipc.Client.call(Client.java:1070) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:238) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:203) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) at tachyon.UnderFileSystemHdfs.init(UnderFileSystemHdfs.java:69) ... 3 more Killed 0 processes Killed 0 processes ec2-54-167-219-159.compute-1.amazonaws.com: Killed 0 processes ec2-54-198-198-17.compute-1.amazonaws.com: Killed 0 processes ec2-54-166-36-0.compute-1.amazonaws.com: Killed 0 processes ---end snippet--- *** I don't have this problem when I launch without the --hadoop-major-version=2 (which defaults to Hadoop 1.x) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3185) SPARK launch on Hadoop 2 in EC2 throws Tachyon exception when Formatting JOURNAL_FOLDER
[ https://issues.apache.org/jira/browse/SPARK-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107388#comment-14107388 ] Jeremy Chambers commented on SPARK-3185: Working on rebuilding client with Hadoop 2. SPARK launch on Hadoop 2 in EC2 throws Tachyon exception when Formatting JOURNAL_FOLDER --- Key: SPARK-3185 URL: https://issues.apache.org/jira/browse/SPARK-3185 Project: Spark Issue Type: Bug Affects Versions: 1.0.2 Environment: Amazon Linux AMI [ec2-user@ip-172-30-1-145 ~]$ uname -a Linux ip-172-30-1-145 3.10.42-52.145.amzn1.x86_64 #1 SMP Tue Jun 10 23:46:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux https://aws.amazon.com/amazon-linux-ami/2014.03-release-notes/ The build I used (and MD5 verified): [ec2-user@ip-172-30-1-145 ~]$ wget http://supergsego.com/apache/spark/spark-1.0.2/spark-1.0.2-bin-hadoop2.tgz Reporter: Jeremy Chambers org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4 When I launch SPARK 1.0.2 on Hadoop 2 in a new EC2 cluster, the above tachyon exception is thrown when Formatting JOURNAL_FOLDER. No exception occurs when I launch on Hadoop 1. Launch used: ./spark-ec2 -k spark_cluster -i /home/ec2-user/kagi/spark_cluster.ppk --zone=us-east-1a --hadoop-major-version=2 --spot-price=0.0165 -s 3 launch sparkProd log snippet Formatting Tachyon Master @ ec2-54-80-49-244.compute-1.amazonaws.com Formatting JOURNAL_FOLDER: /root/tachyon/libexec/../journal/ Exception in thread main java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4 at tachyon.util.CommonUtils.runtimeException(CommonUtils.java:246) at tachyon.UnderFileSystemHdfs.init(UnderFileSystemHdfs.java:73) at tachyon.UnderFileSystemHdfs.getClient(UnderFileSystemHdfs.java:53) at tachyon.UnderFileSystem.get(UnderFileSystem.java:53) at tachyon.Format.main(Format.java:54) Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4 at org.apache.hadoop.ipc.Client.call(Client.java:1070) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:238) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:203) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) at tachyon.UnderFileSystemHdfs.init(UnderFileSystemHdfs.java:69) ... 3 more Killed 0 processes Killed 0 processes ec2-54-167-219-159.compute-1.amazonaws.com: Killed 0 processes ec2-54-198-198-17.compute-1.amazonaws.com: Killed 0 processes ec2-54-166-36-0.compute-1.amazonaws.com: Killed 0 processes ---end snippet--- *** I don't have this problem when I launch without the --hadoop-major-version=2 (which defaults to Hadoop 1.x) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3107) Don't pass null jar to executor in yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107476#comment-14107476 ] Marcelo Vanzin commented on SPARK-3107: --- The {{--jar}} issue is fixed in SPARK-2933. Where do you see the sys properties issue? Setting a system property to an empty value is semantically different from not setting it, although I'm sceptical it would make a difference here. Don't pass null jar to executor in yarn-client mode --- Key: SPARK-3107 URL: https://issues.apache.org/jira/browse/SPARK-3107 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.1.0 Reporter: Andrew Or In the following line, ExecutorLauncher's `--jar` takes in null. {code} 14/08/18 20:52:43 INFO yarn.Client: command: $JAVA_HOME/bin/java -server -Xmx512m ... org.apache.spark.deploy.yarn.ExecutorLauncher --class 'notused' --jar null --arg 'ip-172-31-0-12.us-west-2.compute.internal:56838' --executor-memory 1024 --executor-cores 1 --num-executors 2 {code} Also it appears that we set a bunch of system properties to empty strings (not shown). We should avoid setting these if they don't actually contain values. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3101) Missing volatile annotation in ApplicationMaster
[ https://issues.apache.org/jira/browse/SPARK-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107478#comment-14107478 ] Marcelo Vanzin commented on SPARK-3101: --- I covered this in the PR for SPARK-2933 also. Missing volatile annotation in ApplicationMaster Key: SPARK-3101 URL: https://issues.apache.org/jira/browse/SPARK-3101 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.1.0 Reporter: Kousuke Saruta In ApplicationMaster, a field variable 'isLastAMRetry' is used as a flag but it's not declared as volatile though it's used from multiple threads. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3102) Add tests for yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107480#comment-14107480 ] Marcelo Vanzin commented on SPARK-3102: --- SPARK-2778? Add tests for yarn-client mode -- Key: SPARK-3102 URL: https://issues.apache.org/jira/browse/SPARK-3102 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 1.1.0 Reporter: Josh Rosen It looks like some of the {{yarn-client}} code paths aren't exercised by any of the existing tests because my pull request was able to introduce a bug that wasn't caught by Jenkins: https://github.com/apache/spark/pull/2002#discussion-diff-16331781 We should eventually add tests for this. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3099) Staging Directory is never deleted when we run job with YARN Client Mode
[ https://issues.apache.org/jira/browse/SPARK-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107481#comment-14107481 ] Marcelo Vanzin commented on SPARK-3099: --- Pretty sure I covered this in the PR for SPARK-2933. Staging Directory is never deleted when we run job with YARN Client Mode Key: SPARK-3099 URL: https://issues.apache.org/jira/browse/SPARK-3099 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.1.0 Reporter: Kousuke Saruta When we run application with YARN Cluster mode, the class 'ApplicationMaster' is used as ApplicationMaster, which has shutdown hook to cleanup stagind directory (~/.sparkStaging). But, when we run application with YARN Client mode, the class 'ExecutorLauncher' as an ApplicationMaster doesn't cleanup staging directory. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3090) Avoid not stopping SparkContext with YARN Client mode
[ https://issues.apache.org/jira/browse/SPARK-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107488#comment-14107488 ] Marcelo Vanzin commented on SPARK-3090: --- I think that if we want to add this, it would be better to do so for all modes, not just yarn-client. Basically have SparkContext itself register a shutdown hook to shut itself down, and publish the priority of the hook so that apps / backends can register hooks that run before it (see Hadoop's ShutdownHookManager for the priority thing - http://goo.gl/BQ1bjk). That way the code in the yarn-cluster backend can be removed too. Avoid not stopping SparkContext with YARN Client mode -- Key: SPARK-3090 URL: https://issues.apache.org/jira/browse/SPARK-3090 Project: Spark Issue Type: Bug Components: Spark Core, YARN Affects Versions: 1.1.0 Reporter: Kousuke Saruta When we use YARN Cluster mode, ApplicationMaser register a shutdown hook, stopping SparkContext. Thanks to this, SparkContext can stop even if Application forgets to stop SparkContext itself. But, unfortunately, YARN Client mode doesn't have such mechanism. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2140) yarn stable client doesn't properly handle MEMORY_OVERHEAD for AM
[ https://issues.apache.org/jira/browse/SPARK-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107499#comment-14107499 ] Marcelo Vanzin commented on SPARK-2140: --- Same as SPARK-1287? yarn stable client doesn't properly handle MEMORY_OVERHEAD for AM - Key: SPARK-2140 URL: https://issues.apache.org/jira/browse/SPARK-2140 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.0.0 Reporter: Thomas Graves Fix For: 1.0.1, 1.1.0 The yarn stable client doesn't properly remove the MEMORY_OVERHEAD amount from the java heap size, the code to handle that is commented out (see function calculateAMMemory). We should fix this. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3186) Enable parallelism for Reduce Side Join [Spark Branch]
Szehon Ho created SPARK-3186: Summary: Enable parallelism for Reduce Side Join [Spark Branch] Key: SPARK-3186 URL: https://issues.apache.org/jira/browse/SPARK-3186 Project: Spark Issue Type: Bug Reporter: Szehon Ho Blocked by SPARK-2978. See parent JIRA for design details. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3186) Enable parallelism for Reduce Side Join [Spark Branch]
[ https://issues.apache.org/jira/browse/SPARK-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated SPARK-3186: - Description: (was: Blocked by SPARK-2978. See parent JIRA for design details.) Enable parallelism for Reduce Side Join [Spark Branch] --- Key: SPARK-3186 URL: https://issues.apache.org/jira/browse/SPARK-3186 Project: Spark Issue Type: Bug Reporter: Szehon Ho -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3186) Enable parallelism for Reduce Side Join [Spark Branch]
[ https://issues.apache.org/jira/browse/SPARK-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho resolved SPARK-3186. -- Resolution: Invalid Sorry please ignore, meant to file this in Hive project. Enable parallelism for Reduce Side Join [Spark Branch] --- Key: SPARK-3186 URL: https://issues.apache.org/jira/browse/SPARK-3186 Project: Spark Issue Type: Bug Reporter: Szehon Ho Blocked by SPARK-2978. See parent JIRA for design details. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-3186) Enable parallelism for Reduce Side Join [Spark Branch]
[ https://issues.apache.org/jira/browse/SPARK-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho closed SPARK-3186. Enable parallelism for Reduce Side Join [Spark Branch] --- Key: SPARK-3186 URL: https://issues.apache.org/jira/browse/SPARK-3186 Project: Spark Issue Type: Bug Reporter: Szehon Ho -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3187) Refactor and cleanup Yarn allocator code
Marcelo Vanzin created SPARK-3187: - Summary: Refactor and cleanup Yarn allocator code Key: SPARK-3187 URL: https://issues.apache.org/jira/browse/SPARK-3187 Project: Spark Issue Type: Improvement Components: YARN Reporter: Marcelo Vanzin Priority: Minor This is a follow-up to SPARK-2933, which dealt with the ApplicationMaster code. There's a lot of logic in the container allocation code in alpha/stable that could probably be merged. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3102) Add tests for yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-3102: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-2778 Add tests for yarn-client mode -- Key: SPARK-3102 URL: https://issues.apache.org/jira/browse/SPARK-3102 Project: Spark Issue Type: Sub-task Components: YARN Affects Versions: 1.1.0 Reporter: Josh Rosen It looks like some of the {{yarn-client}} code paths aren't exercised by any of the existing tests because my pull request was able to introduce a bug that wasn't caught by Jenkins: https://github.com/apache/spark/pull/2002#discussion-diff-16331781 We should eventually add tests for this. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3102) Add tests for yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107719#comment-14107719 ] Josh Rosen commented on SPARK-3102: --- Good point; I'll convert this to a subtask of that JIRA. Add tests for yarn-client mode -- Key: SPARK-3102 URL: https://issues.apache.org/jira/browse/SPARK-3102 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 1.1.0 Reporter: Josh Rosen It looks like some of the {{yarn-client}} code paths aren't exercised by any of the existing tests because my pull request was able to introduce a bug that wasn't caught by Jenkins: https://github.com/apache/spark/pull/2002#discussion-diff-16331781 We should eventually add tests for this. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3188) Add Robust Regression Algorithm with Turkey bisquare weight function (Biweight Estimates)
Fan Jiang created SPARK-3188: Summary: Add Robust Regression Algorithm with Turkey bisquare weight function (Biweight Estimates) Key: SPARK-3188 URL: https://issues.apache.org/jira/browse/SPARK-3188 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.0.2 Reporter: Fan Jiang Priority: Critical Fix For: 1.1.1, 1.2.0 Linear least square estimates assume the error has normal distribution and can behave badly when the errors are heavy-tailed. In practical we get various types of data. We need to include Robust Regression to employ a fitting criterion that is not as vulnerable as least square. The Turkey bisquare weight function, also referred to as the biweight function, produces and M-estimator that is more resistant to regression outliers than the Huber M-estimator ()Andersen 2008: 19). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-3189) Add Robust Regression Algorithm with Turkey bisquare weight function (Biweight Estimates)
[ https://issues.apache.org/jira/browse/SPARK-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Jiang closed SPARK-3189. Resolution: Duplicate Add Robust Regression Algorithm with Turkey bisquare weight function (Biweight Estimates) --- Key: SPARK-3189 URL: https://issues.apache.org/jira/browse/SPARK-3189 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.0.2 Reporter: Fan Jiang Priority: Critical Labels: features Fix For: 1.1.1, 1.2.0 Original Estimate: 0h Remaining Estimate: 0h Linear least square estimates assume the error has normal distribution and can behave badly when the errors are heavy-tailed. In practical we get various types of data. We need to include Robust Regression to employ a fitting criterion that is not as vulnerable as least square. The Turkey bisquare weight function, also referred to as the biweight function, produces and M-estimator that is more resistant to regression outliers than the Huber M-estimator ()Andersen 2008: 19). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3189) Add Robust Regression Algorithm with Turkey bisquare weight function (Biweight Estimates)
Fan Jiang created SPARK-3189: Summary: Add Robust Regression Algorithm with Turkey bisquare weight function (Biweight Estimates) Key: SPARK-3189 URL: https://issues.apache.org/jira/browse/SPARK-3189 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 1.0.2 Reporter: Fan Jiang Priority: Critical Fix For: 1.1.1, 1.2.0 Linear least square estimates assume the error has normal distribution and can behave badly when the errors are heavy-tailed. In practical we get various types of data. We need to include Robust Regression to employ a fitting criterion that is not as vulnerable as least square. The Turkey bisquare weight function, also referred to as the biweight function, produces and M-estimator that is more resistant to regression outliers than the Huber M-estimator ()Andersen 2008: 19). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3184) Allow user to specify num tasks to use for a table
[ https://issues.apache.org/jira/browse/SPARK-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107748#comment-14107748 ] Andy Konwinski commented on SPARK-3184: --- [~marmbrus], did we figure out if this feature is in fact missing right now? Allow user to specify num tasks to use for a table -- Key: SPARK-3184 URL: https://issues.apache.org/jira/browse/SPARK-3184 Project: Spark Issue Type: Improvement Components: SQL Reporter: Andy Konwinski -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3190) Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow somewhere
[ https://issues.apache.org/jira/browse/SPARK-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] npanj updated SPARK-3190: - Description: While creating a graph with 6B nodes and 12B edges, I noticed that 'numVertices' api returns incorrect result; 'numEdges' reports correct number. For few times(with different dataset 2.5B nodes) I have also notices that numVertices is returned as -ive number; so I suspect that there is some overflow (may be we are using Int for some field?). Here is some details of experiments I have done so far: 1. Input: numNodes=6101995593 ; noEdges=12163784626 Graph returns: numVertices=1807028297 ; numEdges=12163784626 2. Input : numNodes=2157586441 ; noEdges=2747322705 Graph Returns: numVertices=-2137380855 ; numEdges=2747322705 3. Input: numNodes=1725060105 ; noEdges=204176821 Graph: numVertices=1725060105 ; numEdges=2041768213 You can find the code to generate this bug here: https://gist.github.com/npanj/92e949d86d08715bf4bf was: While creating a graph with 6B nodes and 12B edges, I noticed that 'numVertices' api returns incorrect result; 'numEdges' reports correct number. For few times(with different dataset 2.5B nodes) I have also notices that numVertices is returned as -ive number; so I suspect that there is some overflow (may be we are using Int for some field?). Here is some details of experiments I have done so far: 1. Input: numNodes=6101995593 ; noEdges=12163784626 Graph returns: numVertices=1807028297 ; numEdges=12163784626 2. Input : numNodes=2157586441 ; noEdges=2747322705 Graph Returns: numVertices=-2137380855 ; numEdges=2747322705 3. Input: numNodes=1725060105 ; noEdges=204176821 Graph: numVertices=1725060105 ; numEdges=2041768213 You can find the code to generate this bug here: https://gist.github.com/npanj/92e949d86d08715bf4bf Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow somewhere --- Key: SPARK-3190 URL: https://issues.apache.org/jira/browse/SPARK-3190 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.0.3 Environment: Standalone mode running on EC2 Reporter: npanj Priority: Critical While creating a graph with 6B nodes and 12B edges, I noticed that 'numVertices' api returns incorrect result; 'numEdges' reports correct number. For few times(with different dataset 2.5B nodes) I have also notices that numVertices is returned as -ive number; so I suspect that there is some overflow (may be we are using Int for some field?). Here is some details of experiments I have done so far: 1. Input: numNodes=6101995593 ; noEdges=12163784626 Graph returns: numVertices=1807028297 ; numEdges=12163784626 2. Input : numNodes=2157586441 ; noEdges=2747322705 Graph Returns: numVertices=-2137380855 ; numEdges=2747322705 3. Input: numNodes=1725060105 ; noEdges=204176821 Graph: numVertices=1725060105 ; numEdges=2041768213 You can find the code to generate this bug here: https://gist.github.com/npanj/92e949d86d08715bf4bf -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3190) Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow somewhere
[ https://issues.apache.org/jira/browse/SPARK-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] npanj updated SPARK-3190: - Environment: Standalone mode running on EC2 . Using latest code from master branch upto commit #db56f2df1b8027171da1b8d2571d1f2ef1e103b6 . (was: Standalone mode running on EC2 ) Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow somewhere --- Key: SPARK-3190 URL: https://issues.apache.org/jira/browse/SPARK-3190 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.0.3 Environment: Standalone mode running on EC2 . Using latest code from master branch upto commit #db56f2df1b8027171da1b8d2571d1f2ef1e103b6 . Reporter: npanj Priority: Critical While creating a graph with 6B nodes and 12B edges, I noticed that 'numVertices' api returns incorrect result; 'numEdges' reports correct number. For few times(with different dataset 2.5B nodes) I have also notices that numVertices is returned as -ive number; so I suspect that there is some overflow (may be we are using Int for some field?). Here is some details of experiments I have done so far: 1. Input: numNodes=6101995593 ; noEdges=12163784626 Graph returns: numVertices=1807028297 ; numEdges=12163784626 2. Input : numNodes=2157586441 ; noEdges=2747322705 Graph Returns: numVertices=-2137380855 ; numEdges=2747322705 3. Input: numNodes=1725060105 ; noEdges=204176821 Graph: numVertices=1725060105 ; numEdges=2041768213 You can find the code to generate this bug here: https://gist.github.com/npanj/92e949d86d08715bf4bf -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3190) Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow somewhere
[ https://issues.apache.org/jira/browse/SPARK-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] npanj updated SPARK-3190: - Description: While creating a graph with 6B nodes and 12B edges, I noticed that 'numVertices' api returns incorrect result; 'numEdges' reports correct number. For few times(with different dataset 2.5B nodes) I have also notices that numVertices is returned as -ive number; so I suspect that there is some overflow (may be we are using Int for some field?). Here is some details of experiments I have done so far: 1. Input: numNodes=6101995593 ; noEdges=12163784626 Graph returns: numVertices=1807028297 ; numEdges=12163784626 2. Input : numNodes=2157586441 ; noEdges=2747322705 Graph Returns: numVertices=-2137380855 ; numEdges=2747322705 3. Input: numNodes=1725060105 ; noEdges=204176821 Graph: numVertices=1725060105 ; numEdges=2041768213 You can find the code to generate this bug here: https://gist.github.com/npanj/92e949d86d08715bf4bf Note: Nodes are labeled are 1...6B . was: While creating a graph with 6B nodes and 12B edges, I noticed that 'numVertices' api returns incorrect result; 'numEdges' reports correct number. For few times(with different dataset 2.5B nodes) I have also notices that numVertices is returned as -ive number; so I suspect that there is some overflow (may be we are using Int for some field?). Here is some details of experiments I have done so far: 1. Input: numNodes=6101995593 ; noEdges=12163784626 Graph returns: numVertices=1807028297 ; numEdges=12163784626 2. Input : numNodes=2157586441 ; noEdges=2747322705 Graph Returns: numVertices=-2137380855 ; numEdges=2747322705 3. Input: numNodes=1725060105 ; noEdges=204176821 Graph: numVertices=1725060105 ; numEdges=2041768213 You can find the code to generate this bug here: https://gist.github.com/npanj/92e949d86d08715bf4bf Creation of large graph(over 2.5B nodes) seems to be broken:possible overflow somewhere --- Key: SPARK-3190 URL: https://issues.apache.org/jira/browse/SPARK-3190 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.0.3 Environment: Standalone mode running on EC2 . Using latest code from master branch upto commit #db56f2df1b8027171da1b8d2571d1f2ef1e103b6 . Reporter: npanj Priority: Critical While creating a graph with 6B nodes and 12B edges, I noticed that 'numVertices' api returns incorrect result; 'numEdges' reports correct number. For few times(with different dataset 2.5B nodes) I have also notices that numVertices is returned as -ive number; so I suspect that there is some overflow (may be we are using Int for some field?). Here is some details of experiments I have done so far: 1. Input: numNodes=6101995593 ; noEdges=12163784626 Graph returns: numVertices=1807028297 ; numEdges=12163784626 2. Input : numNodes=2157586441 ; noEdges=2747322705 Graph Returns: numVertices=-2137380855 ; numEdges=2747322705 3. Input: numNodes=1725060105 ; noEdges=204176821 Graph: numVertices=1725060105 ; numEdges=2041768213 You can find the code to generate this bug here: https://gist.github.com/npanj/92e949d86d08715bf4bf Note: Nodes are labeled are 1...6B . -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3169) make-distribution.sh failed
[ https://issues.apache.org/jira/browse/SPARK-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107805#comment-14107805 ] Apache Spark commented on SPARK-3169: - User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/2101 make-distribution.sh failed --- Key: SPARK-3169 URL: https://issues.apache.org/jira/browse/SPARK-3169 Project: Spark Issue Type: Bug Components: Build Reporter: Guoqiang Li Priority: Blocker {code}./make-distribution.sh -Pyarn -Phadoop-2.3 -Phive-thriftserver -Phive -Dhadoop.version=2.3.0 {code} = {noformat} java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) Caused by: scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature in TestSuiteBase.class refers to term dstream in package org.apache.spark.streaming which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling TestSuiteBase.class. {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2707) Upgrade to Akka 2.3
[ https://issues.apache.org/jira/browse/SPARK-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107827#comment-14107827 ] Richard W. Eggert II commented on SPARK-2707: - Upgrading to Akka 2.3 will also allow SparkContexts to be created within other applications that use Akka 2.3, especially Play 2.3 web applications. Akka 2.2 and 2.3 appear to be binary incompatible, which means that Spark cannot currently be used within a Play 2.3 application. Upgrade to Akka 2.3 --- Key: SPARK-2707 URL: https://issues.apache.org/jira/browse/SPARK-2707 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 1.0.0 Reporter: Yardena Upgrade Akka from 2.2 to 2.3. We want to be able to use new Akka and Spray features directly in the same project. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3169) make-distribution.sh failed
[ https://issues.apache.org/jira/browse/SPARK-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-3169. Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 2101 [https://github.com/apache/spark/pull/2101] make-distribution.sh failed --- Key: SPARK-3169 URL: https://issues.apache.org/jira/browse/SPARK-3169 Project: Spark Issue Type: Bug Components: Build Reporter: Guoqiang Li Priority: Blocker Fix For: 1.1.0 {code}./make-distribution.sh -Pyarn -Phadoop-2.3 -Phive-thriftserver -Phive -Dhadoop.version=2.3.0 {code} = {noformat} java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) Caused by: scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature in TestSuiteBase.class refers to term dstream in package org.apache.spark.streaming which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling TestSuiteBase.class. {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3169) make-distribution.sh failed
[ https://issues.apache.org/jira/browse/SPARK-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3169: --- Assignee: Tathagata Das make-distribution.sh failed --- Key: SPARK-3169 URL: https://issues.apache.org/jira/browse/SPARK-3169 Project: Spark Issue Type: Bug Components: Build Reporter: Guoqiang Li Assignee: Tathagata Das Priority: Blocker Fix For: 1.1.0 {code}./make-distribution.sh -Pyarn -Phadoop-2.3 -Phive-thriftserver -Phive -Dhadoop.version=2.3.0 {code} = {noformat} java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) Caused by: scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature in TestSuiteBase.class refers to term dstream in package org.apache.spark.streaming which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling TestSuiteBase.class. {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3175) Branch-1.1 SBT build failed for Yarn-Alpha
[ https://issues.apache.org/jira/browse/SPARK-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-3175. Resolution: Won't Fix We have to keep these versions slightly out of sync when we are making release candidates due to the way that our maven publishing plug-in works. If you checkout the specific release snapshots though (e.g. snapshot1, rc1, etc). Then it will work. This issue is only relevant for the older YARN build. Branch-1.1 SBT build failed for Yarn-Alpha -- Key: SPARK-3175 URL: https://issues.apache.org/jira/browse/SPARK-3175 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.1.1 Reporter: Chester Labels: build Fix For: 1.1.1 Original Estimate: 1h Remaining Estimate: 1h When trying to build yarn-alpha on branch-1.1 áš› |branch-1.1|$ sbt/sbt -Pyarn-alpha -Dhadoop.version=2.0.5-alpha projects [info] Loading project definition from /Users/chester/projects/spark/project org.apache.maven.model.building.ModelBuildingException: 1 problem was encountered while building the effective model for org.apache.spark:spark-yarn-alpha_2.10:1.1.0 [FATAL] Non-resolvable parent POM: Could not find artifact org.apache.spark:yarn-parent_2.10:pom:1.1.0 in central ( http://repo.maven.apache.org/maven2) and 'parent.relativePath' points at wrong local POM @ line 20, column 11 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3170) Bug Fix in Storage UI
[ https://issues.apache.org/jira/browse/SPARK-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3170: --- Priority: Critical (was: Minor) Bug Fix in Storage UI - Key: SPARK-3170 URL: https://issues.apache.org/jira/browse/SPARK-3170 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0, 1.0.2 Reporter: uncleGen Priority: Critical current compeleted stage only need to remove its own partitions that are no longer cached. Currently, Storage in Spark UI may lost some rdds which are cached actually. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-2963) The description about how to build for using CLI and Thrift JDBC server is absent in proper document
[ https://issues.apache.org/jira/browse/SPARK-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2963. Resolution: Fixed Assignee: Kousuke Saruta Thanks - I've merged your fix. The description about how to build for using CLI and Thrift JDBC server is absent in proper document - Key: SPARK-2963 URL: https://issues.apache.org/jira/browse/SPARK-2963 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Fix For: 1.1.0 Currently, if we'd like to use HiveServer or CLI for SparkSQL, we need to use -Phive-thriftserver option when building but it's description is incomplete. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org