[jira] [Updated] (SPARK-4331) SBT Scalastyle doesn't work for the sources under hive's v0.12.0 and v0.13.1
[ https://issues.apache.org/jira/browse/SPARK-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-4331: Target Version/s: 1.4.0 (was: 1.3.0) SBT Scalastyle doesn't work for the sources under hive's v0.12.0 and v0.13.1 Key: SPARK-4331 URL: https://issues.apache.org/jira/browse/SPARK-4331 Project: Spark Issue Type: Bug Components: Build, SQL Affects Versions: 1.3.0 Reporter: Kousuke Saruta v0.13.1 and v0.12.0 is not standard directory structure for sbt's sclastyle plugin so scalastyle doesn't work for sources under those directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4119) Don't rely on HIVE_DEV_HOME to find .q files
[ https://issues.apache.org/jira/browse/SPARK-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-4119: Target Version/s: 1.4.0 (was: 1.3.0) Don't rely on HIVE_DEV_HOME to find .q files Key: SPARK-4119 URL: https://issues.apache.org/jira/browse/SPARK-4119 Project: Spark Issue Type: Test Components: SQL Affects Versions: 1.1.1 Reporter: Cheng Lian Assignee: Cheng Lian Priority: Minor After merging in Hive 0.13.1 support, a bunch of .q files and golden answer files got updated. Unfortunately, some .q were updated in Hive. For example, an ORDER BY clause was added to groupby1_limit.q for bug fix. With HIVE_DEV_HOME set, developers working on Hive 0.12.0 may end up with false test failures. Because .q files are looked up from HIVE_DEV_HOME and outdated .q files are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2472) Spark SQL Thrift server sometimes assigns wrong job group name
[ https://issues.apache.org/jira/browse/SPARK-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2472: Target Version/s: 1.4.0 (was: 1.3.0) Spark SQL Thrift server sometimes assigns wrong job group name -- Key: SPARK-2472 URL: https://issues.apache.org/jira/browse/SPARK-2472 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.0 Reporter: Cheng Lian Priority: Minor Sample beeline session used to reproduce this issue: {code} 0: jdbc:hive2://localhost:1 drop table test; +-+ | result | +-+ +-+ No rows selected (0.614 seconds) 0: jdbc:hive2://localhost:1 create table hive_table_copy as select * from hive_table; +--++ | key | value | +--++ +--++ No rows selected (0.493 seconds) 0 {code} The second statement results in two stages, the first stage is labeled with the first {{drop table}} statement rather than the CTAS statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5165) Add support for rollup and cube in sqlcontext
[ https://issues.apache.org/jira/browse/SPARK-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-5165: Target Version/s: 1.4.0 (was: 1.3.0) Add support for rollup and cube in sqlcontext - Key: SPARK-5165 URL: https://issues.apache.org/jira/browse/SPARK-5165 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 1.2.0 Reporter: Fei Wang Add support for rollup and cube in sqlcontext -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4760) ANALYZE TABLE table COMPUTE STATISTICS noscan failed estimating table size for tables created from Parquet files
[ https://issues.apache.org/jira/browse/SPARK-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-4760: Target Version/s: 1.4.0 (was: 1.3.0) ANALYZE TABLE table COMPUTE STATISTICS noscan failed estimating table size for tables created from Parquet files -- Key: SPARK-4760 URL: https://issues.apache.org/jira/browse/SPARK-4760 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: Jianshi Huang Priority: Critical In a older Spark version built around Oct. 12, I was able to use ANALYZE TABLE table COMPUTE STATISTICS noscan to get estimated table size, which is important for optimizing joins. (I'm joining 15 small dimension tables, and this is crucial to me). In the more recent Spark builds, it fails to estimate the table size unless I remove noscan. Here's the statistics I got using DESC EXTENDED: old: parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1417763591, totalSize=56166} new: parameters:{numFiles=0, EXTERNAL=TRUE, transient_lastDdlTime=1417763892, COLUMN_STATS_ACCURATE=false, totalSize=0, numRows=-1, rawDataSize=-1} And I've tried turning off spark.sql.hive.convertMetastoreParquet in my spark-defaults.conf and the result is unaffected (in both versions). Looks like the Parquet support in new Hive (0.13.1) is broken? Jianshi -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5295) Stabilize data types
[ https://issues.apache.org/jira/browse/SPARK-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-5295: Target Version/s: 1.4.0 (was: 1.3.0) Stabilize data types Key: SPARK-5295 URL: https://issues.apache.org/jira/browse/SPARK-5295 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Yin Huai 1. We expose all the stuff in data types right now, including NumericTypes, etc. These should be hidden from users. We should only expose the leaf types. 2. Remove DeveloperAPI tag from the common types. 3. Specify the internal type, external scala type, and external java type for each data type. 4. Add conversion functions between internal type, external scala type, and external java type into each type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5100) Spark Thrift server monitor page
[ https://issues.apache.org/jira/browse/SPARK-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-5100: Target Version/s: 1.4.0 (was: 1.3.0) Spark Thrift server monitor page Key: SPARK-5100 URL: https://issues.apache.org/jira/browse/SPARK-5100 Project: Spark Issue Type: New Feature Components: SQL, Web UI Reporter: Yi Tian Priority: Critical Attachments: Spark Thrift-server monitor page.pdf, prototype-screenshot.png In the latest Spark release, there is a Spark Streaming tab on the driver web UI, which shows information about running streaming application. It should be helpful for providing a monitor page in Thrift server, because both streaming and Thrift server are long-term applications, and the details of the application do not show on stage page or job page. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4852) Hive query plan deserialization failure caused by shaded hive-exec jar file when generating golden answers
[ https://issues.apache.org/jira/browse/SPARK-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-4852: Target Version/s: 1.4.0 (was: 1.3.0, 1.2.1) Hive query plan deserialization failure caused by shaded hive-exec jar file when generating golden answers -- Key: SPARK-4852 URL: https://issues.apache.org/jira/browse/SPARK-4852 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: Cheng Lian Priority: Minor When adding Hive 0.13.1 support for Spark SQL Thrift server in PR [2685|https://github.com/apache/spark/pull/2685], Kryo 2.22 used by original hive-exec-0.13.1.jar was shaded by Kryo 2.21 used by Spark SQL because of dependency hell. Unfortunately, Kryo 2.21 has a known bug that may cause Hive query plan deserialization failure. This bug was fixed in Kryo 2.22. Normally, this issue doesn't affect Spark SQL because we don't even generate Hive query plan. But when running Hive test suites like {{HiveCompatibilitySuite}}, golden answer files must be generated by Hive, and thus triggers this issue. A workaround is to replace {{hive-exec-0.13.1.jar}} under {{$HIVE_HOME/lib}} with Spark's {{hive-exec-0.13.1a.jar}} and {{kryo-2.21.jar}} under {{$SPARK_DEV_HOME/lib_managed/jars}}. Then add {{$HIVE_HOME/lib}} to {{$HADOOP_CLASSPATH}}. Upgrading to some newer version of Kryo which is binary compatible with Kryo 2.22 (if there is one) may fix this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4476) Use MapType for dict in json which has unique keys in each row.
[ https://issues.apache.org/jira/browse/SPARK-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-4476: Target Version/s: 1.4.0 (was: 1.3.0) Use MapType for dict in json which has unique keys in each row. --- Key: SPARK-4476 URL: https://issues.apache.org/jira/browse/SPARK-4476 Project: Spark Issue Type: New Feature Components: SQL Reporter: Davies Liu Priority: Critical For the jsonRDD like this: {code} {a: 1} {b: 2} {c: 3} {d: 4} {e: 5} {code} It will create a StructType with 5 fileds in it, each field come from a different row. It will be a problem if the RDD is large. A StructType with thousands or millions fields is hard to play with (will cause stack overflow during serialization). It should be MapType for this case. We need a clear rule to decide StructType or MapType will be used for dict in json data. cc [~yhuai] [~marmbrus] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4176) Support decimals with precision 18 in Parquet
[ https://issues.apache.org/jira/browse/SPARK-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-4176: Target Version/s: 1.4.0 (was: 1.3.0) Support decimals with precision 18 in Parquet --- Key: SPARK-4176 URL: https://issues.apache.org/jira/browse/SPARK-4176 Project: Spark Issue Type: New Feature Components: SQL Reporter: Matei Zaharia After https://issues.apache.org/jira/browse/SPARK-3929, only decimals with precisions = 18 (that can be read into a Long) will be readable from Parquet, so we still need more work to support these larger ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4801) Add CTE capability to HiveContext
[ https://issues.apache.org/jira/browse/SPARK-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-4801: Target Version/s: 1.4.0 (was: 1.3.0) Add CTE capability to HiveContext - Key: SPARK-4801 URL: https://issues.apache.org/jira/browse/SPARK-4801 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.1.0 Reporter: Jacob Davis This is a request to add CTE functionality to HiveContext. Common Table Expressions are added in Hive 0.13.0 with HIVE-1180. Using CTE style syntax within HiveContext currently results in the following caused by message: {code} Caused by: scala.MatchError: TOK_CTE (of class org.apache.hadoop.hive.ql.parse.ASTNode) at org.apache.spark.sql.hive.HiveQl$$anonfun$13.apply(HiveQl.scala:500) at org.apache.spark.sql.hive.HiveQl$$anonfun$13.apply(HiveQl.scala:500) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.hive.HiveQl$.nodeToPlan(HiveQl.scala:500) at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:248) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5680) Sum function on all null values, should return zero
[ https://issues.apache.org/jira/browse/SPARK-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-5680: Target Version/s: 1.4.0 (was: 1.3.0) Sum function on all null values, should return zero --- Key: SPARK-5680 URL: https://issues.apache.org/jira/browse/SPARK-5680 Project: Spark Issue Type: Bug Components: SQL Reporter: Venkata Ramana G Priority: Minor SELECT sum('a'), avg('a'), variance('a'), std('a') FROM src; Current output: NULL NULLNULLNULL Expected output: 0.0 NULLNULLNULL This fixes hive udaf_number_format.q -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3860) Improve dimension joins
[ https://issues.apache.org/jira/browse/SPARK-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3860: Target Version/s: 1.4.0 (was: 1.3.0) Improve dimension joins --- Key: SPARK-3860 URL: https://issues.apache.org/jira/browse/SPARK-3860 Project: Spark Issue Type: Improvement Components: SQL Reporter: Reynold Xin Assignee: Michael Armbrust Priority: Critical This is an umbrella ticket for improving performance for joining multiple dimension tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2087) Clean Multi-user semantics for thrift JDBC/ODBC server.
[ https://issues.apache.org/jira/browse/SPARK-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2087: Target Version/s: 1.4.0 (was: 1.3.0) Clean Multi-user semantics for thrift JDBC/ODBC server. --- Key: SPARK-2087 URL: https://issues.apache.org/jira/browse/SPARK-2087 Project: Spark Issue Type: Bug Components: SQL Reporter: Michael Armbrust Priority: Minor Configuration and temporary tables should exist per-user. Cached tables should be shared across users. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5436) Validate GradientBoostedTrees during training
[ https://issues.apache.org/jira/browse/SPARK-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323289#comment-14323289 ] Chris T commented on SPARK-5436: I thought about this too, but I think there are cases where a user might wish to build a model with N trees, and examine the error rate after the fact. If, for example, we wer worried about finding global vs local minima, or we wanted to asses the rate at which a model started to overfit, or we wanted to do some kind testing. There are valid reasons why we might want both a specified number of trees, but also have the model scoring independently against a testData RDD during build phase. It seems both of these cases could easily be supported concurrently. Validate GradientBoostedTrees during training - Key: SPARK-5436 URL: https://issues.apache.org/jira/browse/SPARK-5436 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley For Gradient Boosting, it would be valuable to compute test error on a separate validation set during training. That way, training could stop early based on the test error (or some other metric specified by the user). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5436) Validate GradientBoostedTrees during training
[ https://issues.apache.org/jira/browse/SPARK-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323323#comment-14323323 ] Joseph K. Bradley commented on SPARK-5436: -- Yep, that sounds like what I had in mind: {code} def evaluateEachIteration(data: RDD[LabeledPoint], evaluator or maybe use training metric): Array[Double] {code} where it essentially calls predict() once but keeps the intermediate results after each boosting stage, so that it runs in the same big-O time as predict(). Validate GradientBoostedTrees during training - Key: SPARK-5436 URL: https://issues.apache.org/jira/browse/SPARK-5436 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley For Gradient Boosting, it would be valuable to compute test error on a separate validation set during training. That way, training could stop early based on the test error (or some other metric specified by the user). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5846) Spark SQL should set job description and pool *before* running jobs
Kay Ousterhout created SPARK-5846: - Summary: Spark SQL should set job description and pool *before* running jobs Key: SPARK-5846 URL: https://issues.apache.org/jira/browse/SPARK-5846 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.1, 1.3.0 Reporter: Kay Ousterhout Assignee: Kay Ousterhout Spark SQL current sets the scheduler pool and job description AFTER jobs run (see https://github.com/apache/spark/blob/master/sql/hive-thriftserver/v0.13.1/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim13.scala#L168 -- which happens after calling hiveContext.sql). This should be done before the job is run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5005) Failed to start spark-shell when using yarn-client mode with the Spark1.2.0
[ https://issues.apache.org/jira/browse/SPARK-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323683#comment-14323683 ] anuj commented on SPARK-5005: - i am having same issue. @yangping wu what is the resolution for your case? Failed to start spark-shell when using yarn-client mode with the Spark1.2.0 Key: SPARK-5005 URL: https://issues.apache.org/jira/browse/SPARK-5005 Project: Spark Issue Type: Bug Components: Spark Core, Spark Shell, YARN Affects Versions: 1.2.0 Environment: Spark 1.2.0 Hadoop 2.2.0 Reporter: yangping wu Priority: Minor Original Estimate: 8h Remaining Estimate: 8h I am using Spark 1.2.0, but when I starting spark-shell with yarn-client mode({code}MASTER=yarn-client bin/spark-shell{code}), It Failed and the error message is {code} Unknown/unsupported param List(--executor-memory, 1024m, --executor-cores, 8, --num-executors, 2) Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options] Options: --jar JAR_PATH Path to your application's JAR file (required) --class CLASS_NAME Name of your application's main class (required) --args ARGS Arguments to be passed to your application's main class. Mutliple invocations are possible, each will be passed in order. --num-executors NUMNumber of executors to start (Default: 2) --executor-cores NUM Number of cores for the executors (Default: 1) --executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G) {code} But when I using Spark 1.1.0,and also using {code}MASTER=yarn-client bin/spark-shell{code} to starting spark-shell,it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5837) HTTP 500 if try to access Spark UI in yarn-cluster or yarn-client mode
Marco Capuccini created SPARK-5837: -- Summary: HTTP 500 if try to access Spark UI in yarn-cluster or yarn-client mode Key: SPARK-5837 URL: https://issues.apache.org/jira/browse/SPARK-5837 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.2.1, 1.2.0 Reporter: Marco Capuccini Priority: Blocker Both Spark 1.2.0 and Spark 1.2.1 return this error when I try to access the Spark UI if I run over yarn (version 2.4.0): HTTP ERROR 500 Problem accessing /proxy/application_1423564210894_0017/. Reason: Connection refused Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at java.net.Socket.init(Socket.java:425) at java.net.Socket.init(Socket.java:280) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:187) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:344) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:66) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:79) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1192) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at
[jira] [Updated] (SPARK-5831) When checkpoint file size is bigger than 10, then delete them
[ https://issues.apache.org/jira/browse/SPARK-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-5831: - Priority: Trivial (was: Minor) Assignee: meiyoula When checkpoint file size is bigger than 10, then delete them - Key: SPARK-5831 URL: https://issues.apache.org/jira/browse/SPARK-5831 Project: Spark Issue Type: Improvement Components: Streaming Reporter: meiyoula Assignee: meiyoula Priority: Trivial Fix For: 1.4.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5831) When checkpoint file size is bigger than 10, then delete them
[ https://issues.apache.org/jira/browse/SPARK-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5831. -- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 4621 [https://github.com/apache/spark/pull/4621] When checkpoint file size is bigger than 10, then delete them - Key: SPARK-5831 URL: https://issues.apache.org/jira/browse/SPARK-5831 Project: Spark Issue Type: Improvement Components: Streaming Reporter: meiyoula Priority: Minor Fix For: 1.4.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4010) Spark UI returns 500 in yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322851#comment-14322851 ] Marco Capuccini commented on SPARK-4010: Yes, and it seems to be fixed... but I still have the problem in Spark 1.2.1, and 1.2.0. Spark UI returns 500 in yarn-client mode - Key: SPARK-4010 URL: https://issues.apache.org/jira/browse/SPARK-4010 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.2.0 Reporter: Guoqiang Li Assignee: Guoqiang Li Priority: Blocker Fix For: 1.1.1, 1.2.0 http://host/proxy/application_id/stages/ returns this result: {noformat} HTTP ERROR 500 Problem accessing /proxy/application_1411648907638_0281/stages/. Reason: Connection refused Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at java.net.Socket.init(Socket.java:425) at java.net.Socket.init(Socket.java:280) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:185) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:336) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:66) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1183) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at
[jira] [Issue Comment Deleted] (SPARK-4010) Spark UI returns 500 in yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Capuccini updated SPARK-4010: --- Comment: was deleted (was: Yes, and it seems to be fixed... but I still have the problem in Spark 1.2.1, and 1.2.0.) Spark UI returns 500 in yarn-client mode - Key: SPARK-4010 URL: https://issues.apache.org/jira/browse/SPARK-4010 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.2.0 Reporter: Guoqiang Li Assignee: Guoqiang Li Priority: Blocker Fix For: 1.1.1, 1.2.0 http://host/proxy/application_id/stages/ returns this result: {noformat} HTTP ERROR 500 Problem accessing /proxy/application_1411648907638_0281/stages/. Reason: Connection refused Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at java.net.Socket.init(Socket.java:425) at java.net.Socket.init(Socket.java:280) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:185) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:336) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:66) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1183) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at
[jira] [Resolved] (SPARK-1697) Driver error org.apache.spark.scheduler.TaskSetManager - Loss was due to java.io.FileNotFoundException
[ https://issues.apache.org/jira/browse/SPARK-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-1697. -- Resolution: Duplicate This is either stale, or likely the same issue identified in SPARK-2243 Driver error org.apache.spark.scheduler.TaskSetManager - Loss was due to java.io.FileNotFoundException -- Key: SPARK-1697 URL: https://issues.apache.org/jira/browse/SPARK-1697 Project: Spark Issue Type: Bug Components: Scheduler Reporter: Arup Malakar We are running spark-streaming 0.9.0 on top of Yarn (Hadoop 2.2.0-cdh5.0.0-beta-2). It reads from kafka and processes the data. So far we haven't seen any issues, except today we saw an exception in the driver log and it is not consuming kafka messages any more. Here is the exception we saw: {code} 2014-05-01 10:00:43,962 [Result resolver thread-3] WARN org.apache.spark.scheduler.TaskSetManager - Loss was due to java.io.FileNotFoundException java.io.FileNotFoundException: http://10.50.40.85:53055/broadcast_2412 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at scala.collection.immutable.$colon$colon.readObject(List.scala:362) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at scala.collection.immutable.$colon$colon.readObject(List.scala:362) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at
[jira] [Updated] (SPARK-5835) Unit test causes java.io.FileNotFoundException on localhost for file broadcast_1
[ https://issues.apache.org/jira/browse/SPARK-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-5835: - Component/s: Tests Priority: Minor (was: Major) You say you're not running in parallel but are you creating multiple SparkContexts? then this is the same as https://issues.apache.org/jira/browse/SPARK-2243 Unit test causes java.io.FileNotFoundException on localhost for file broadcast_1 -- Key: SPARK-5835 URL: https://issues.apache.org/jira/browse/SPARK-5835 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 1.0.0 Reporter: sam Priority: Minor Note, I do not believe this is related to SPARK-2984 since I have speculative execution off (it's off by default in 1.0.0). I intermittently get the following stack trace in my unit tests. I'm using specs2 and I have sequential in the tests (so should not be bumping into each other), and also I have `parallelExecution in Test := false` in my `build.sbt`. This isn't a major showstopper, it just means our CI pipelines need some retry logic to workaround the erroring tests. [error] Could not run test my.test.Class: org.apache.spark.SparkException: Job aborted due to stage failure: Task 4.0:0 failed 1 times, most recent failure: Exception failure in TID 6 on host localhost: java.io.FileNotFoundException: http://blar.blar.blar.blar:59528/broadcast_1 [error] sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1834) [error] sun.net.www.protocol.http.HttpURLConnection.access$200(HttpURLConnection.java:90) [error] sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1431) [error] sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1429) [error] java.security.AccessController.doPrivileged(Native Method) [error] java.security.AccessController.doPrivileged(AccessController.java:713) [error] sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1428) [error] org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:196) [error] org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:89) [error] sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) [error] sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [error] java.lang.reflect.Method.invoke(Method.java:483) [error] java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) [error] java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896) [error] java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) [error] java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) [error] java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) [error] java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) [error] java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) [error] java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) [error] java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) [error] java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) [error] java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) [error] java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) [error] java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) [error] java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) [error] java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) [error] java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) [error] java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) [error] java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) [error] java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) [error] java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) [error] java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) [error] scala.collection.immutable.$colon$colon.readObject(List.scala:362) [error] sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) [error] sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [error] java.lang.reflect.Method.invoke(Method.java:483) [error]
[jira] [Updated] (SPARK-5770) Use addJar() to upload a new jar file to executor, it can't be added to classloader
[ https://issues.apache.org/jira/browse/SPARK-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-5770: - Priority: Minor (was: Major) Use addJar() to upload a new jar file to executor, it can't be added to classloader --- Key: SPARK-5770 URL: https://issues.apache.org/jira/browse/SPARK-5770 Project: Spark Issue Type: Bug Components: Spark Core Reporter: meiyoula Priority: Minor First use addJar() to upload a jar to the executor, then change the jar content and upload it again. We can see the jar file in the local has be updated, but the classloader still load the old one. The executor log has no error or exception to point it. I use spark-shell to test it. And set spark.files.overwrite is true. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5770) Use addJar() to upload a new jar file to executor, it can't be added to classloader
[ https://issues.apache.org/jira/browse/SPARK-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5770. -- Resolution: Won't Fix PR was withdrawn Use addJar() to upload a new jar file to executor, it can't be added to classloader --- Key: SPARK-5770 URL: https://issues.apache.org/jira/browse/SPARK-5770 Project: Spark Issue Type: Bug Components: Spark Core Reporter: meiyoula Priority: Minor First use addJar() to upload a jar to the executor, then change the jar content and upload it again. We can see the jar file in the local has be updated, but the classloader still load the old one. The executor log has no error or exception to point it. I use spark-shell to test it. And set spark.files.overwrite is true. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data
[ https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322647#comment-14322647 ] Sean Owen commented on SPARK-1867: -- With Bjorn Jonsson here, we think we located the cause, at least for those people here using CDH 5.2. It seems to occur with the MR1 flavor, when using standalone mode (not YARN). The cause seemed to be that the MR1-flavored dependencies get on the classpath along with non-MR1 Hadoop dependencies, and So I'm going to re-resolve as not a problem _with Spark_ but a particular packaging. Bjorn mentions it seems to be worked around with temporarily changing an env variable for Spark's jobs: {{HADOOP_CONF_DIR to /etc/hadoop/conf.cloudera.YARN-1}} It doesn't seem to happen with 5.3. This might explain why people got it to work by using a 'stock' distribution; in that case they'd not have any MR1 dependencies, I believe. This also relates to https://issues.apache.org/jira/browse/SPARK-4048 which may have been the eventual fix; Marcelo might be able to comment. (Also of interest, note that https://bugs.openjdk.java.net/browse/JDK-7172206 may be causing the real underlying exception to be masked, which doesn't help) I don't know if it's specific to the CDH 5.2 MR1 stuff, and appears resolved anyway later. If so, let's re-close as NotAProblem since it's not to do with Spark, but I'll pause a beat for that. Spark Documentation Error causes java.lang.IllegalStateException: unread block data --- Key: SPARK-1867 URL: https://issues.apache.org/jira/browse/SPARK-1867 Project: Spark Issue Type: Bug Components: Spark Core Reporter: sam I've employed two System Administrators on a contract basis (for quite a bit of money), and both contractors have independently hit the following exception. What we are doing is: 1. Installing Spark 0.9.1 according to the documentation on the website, along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs. 2. Building a fat jar with a Spark app with sbt then trying to run it on the cluster I've also included code snippets, and sbt deps at the bottom. When I've Googled this, there seems to be two somewhat vague responses: a) Mismatching spark versions on nodes/user code b) Need to add more jars to the SparkConf Now I know that (b) is not the problem having successfully run the same code on other clusters while only including one jar (it's a fat jar). But I have no idea how to check for (a) - it appears Spark doesn't have any version checks or anything - it would be nice if it checked versions and threw a mismatching version exception: you have user code using version X and node Y has version Z. I would be very grateful for advice on this. The exception: Exception in thread main org.apache.spark.SparkException: Job aborted: Task 0.0:1 failed 32 times (most recent failure: Exception failure: java.lang.IllegalStateException: unread block data) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604) at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/05/16 18:05:31 INFO
[jira] [Resolved] (SPARK-5830) Don't create unnecessary directory for local root dir
[ https://issues.apache.org/jira/browse/SPARK-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5830. -- Resolution: Duplicate Don't create unnecessary directory for local root dir - Key: SPARK-5830 URL: https://issues.apache.org/jira/browse/SPARK-5830 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Weizhong Priority: Minor Now will create an unnecessary directory for local root directory, and this directory will not be deleted after application exit. For example: before will create tmp dir like /tmp/spark-UUID now will create tmp dir like /tmp/spark-UUID/spark-UUID so the dir /tmp/spark-UUID will not be deleted as a local root directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5832) Add Affinity Propagation clustering algorithm
Liang-Chi Hsieh created SPARK-5832: -- Summary: Add Affinity Propagation clustering algorithm Key: SPARK-5832 URL: https://issues.apache.org/jira/browse/SPARK-5832 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Liang-Chi Hsieh -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5296) Predicate Pushdown (BaseRelation) to have an interface that will accept more filters
[ https://issues.apache.org/jira/browse/SPARK-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-5296: -- Summary: Predicate Pushdown (BaseRelation) to have an interface that will accept more filters (was: Predicate Pushdown (BaseRelation) to have an interface that will accept OR filters) Predicate Pushdown (BaseRelation) to have an interface that will accept more filters Key: SPARK-5296 URL: https://issues.apache.org/jira/browse/SPARK-5296 Project: Spark Issue Type: Improvement Components: SQL Reporter: Corey J. Nolet Assignee: Cheng Lian Priority: Critical Currently, the BaseRelation API allows a FilteredRelation to handle an Array[Filter] which represents filter expressions that are applied as an AND operator. We should support OR operations in a BaseRelation as well. I'm not sure what this would look like in terms of API changes, but it almost seems like a FilteredUnionedScan BaseRelation (the name stinks but you get the idea) would be useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5296) Predicate Pushdown (BaseRelation) to have an interface that will accept more filters
[ https://issues.apache.org/jira/browse/SPARK-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322550#comment-14322550 ] Cheng Lian commented on SPARK-5296: --- Nested AND/OR/NOT filters can be processed in a way very similar to the Parquet filter push-down code. Predicate Pushdown (BaseRelation) to have an interface that will accept more filters Key: SPARK-5296 URL: https://issues.apache.org/jira/browse/SPARK-5296 Project: Spark Issue Type: Improvement Components: SQL Reporter: Corey J. Nolet Assignee: Cheng Lian Priority: Critical Currently, the BaseRelation API allows a FilteredRelation to handle an Array[Filter] which represents filter expressions that are applied as an AND operator. We should support OR operations in a BaseRelation as well. I'm not sure what this would look like in terms of API changes, but it almost seems like a FilteredUnionedScan BaseRelation (the name stinks but you get the idea) would be useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5296) Predicate Pushdown (BaseRelation) to have an interface that will accept more filters
[ https://issues.apache.org/jira/browse/SPARK-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322556#comment-14322556 ] Apache Spark commented on SPARK-5296: - User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/4623 Predicate Pushdown (BaseRelation) to have an interface that will accept more filters Key: SPARK-5296 URL: https://issues.apache.org/jira/browse/SPARK-5296 Project: Spark Issue Type: Improvement Components: SQL Reporter: Corey J. Nolet Assignee: Cheng Lian Priority: Critical Currently, the BaseRelation API allows a FilteredRelation to handle an Array[Filter] which represents filter expressions that are applied as an AND operator. We should support OR operations in a BaseRelation as well. I'm not sure what this would look like in terms of API changes, but it almost seems like a FilteredUnionedScan BaseRelation (the name stinks but you get the idea) would be useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3638) Commons HTTP client dependency conflict in extras/kinesis-asl module
[ https://issues.apache.org/jira/browse/SPARK-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322559#comment-14322559 ] Littlestar commented on SPARK-3638: --- Oh, It was introduced in kinesis-asl profile only. I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. now I build spark with kinesis-asl profile, it's ok with httpclient 4.2.6, thanks. mvn dependency:tree Commons HTTP client dependency conflict in extras/kinesis-asl module Key: SPARK-3638 URL: https://issues.apache.org/jira/browse/SPARK-3638 Project: Spark Issue Type: Bug Components: Examples, Streaming Affects Versions: 1.1.0 Reporter: Aniket Bhatnagar Labels: dependencies Fix For: 1.1.1, 1.2.0 Followed instructions as mentioned @ https://github.com/apache/spark/blob/master/docs/streaming-kinesis-integration.md and when running the example, I get the following error: {code} Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99) at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29) at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97) at com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117) at com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132) {code} I believe this is due to the dependency conflict as described @ http://mail-archives.apache.org/mod_mbox/spark-dev/201409.mbox/%3ccajob8btdxks-7-spjj5jmnw0xsnrjwdpcqqtjht1hun6j4z...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3638) Commons HTTP client dependency conflict in extras/kinesis-asl module
[ https://issues.apache.org/jira/browse/SPARK-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322518#comment-14322518 ] Aniket Bhatnagar commented on SPARK-3638: - Did you build spark with kinesis-asl profile? The standard distribution does not have this profile and therefore you would have to roll your won as described in https://github.com/apache/spark/blob/master/docs/streaming-kinesis-integration.md (mvn -Pkinesis-asl -DskipTests clean package). Commons HTTP client dependency conflict in extras/kinesis-asl module Key: SPARK-3638 URL: https://issues.apache.org/jira/browse/SPARK-3638 Project: Spark Issue Type: Bug Components: Examples, Streaming Affects Versions: 1.1.0 Reporter: Aniket Bhatnagar Labels: dependencies Fix For: 1.1.1, 1.2.0 Followed instructions as mentioned @ https://github.com/apache/spark/blob/master/docs/streaming-kinesis-integration.md and when running the example, I get the following error: {code} Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99) at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29) at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97) at com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117) at com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132) {code} I believe this is due to the dependency conflict as described @ http://mail-archives.apache.org/mod_mbox/spark-dev/201409.mbox/%3ccajob8btdxks-7-spjj5jmnw0xsnrjwdpcqqtjht1hun6j4z...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5832) Add Affinity Propagation clustering algorithm
[ https://issues.apache.org/jira/browse/SPARK-5832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322528#comment-14322528 ] Apache Spark commented on SPARK-5832: - User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/4622 Add Affinity Propagation clustering algorithm - Key: SPARK-5832 URL: https://issues.apache.org/jira/browse/SPARK-5832 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Liang-Chi Hsieh -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5804) Explicitly manage cache in Crossvalidation k-fold loop
[ https://issues.apache.org/jira/browse/SPARK-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-5804: - Assignee: Peter Rudenko Explicitly manage cache in Crossvalidation k-fold loop -- Key: SPARK-5804 URL: https://issues.apache.org/jira/browse/SPARK-5804 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 1.3.0 Reporter: Peter Rudenko Assignee: Peter Rudenko Priority: Minor Fix For: 1.3.0 On a big dataset explicitly unpersist train and validation folds allows to load more data into memory in the next loop iteration. On my environment (single node 8Gb worker RAM, 2 GB dataset file, 3 folds for cross validation), saved more than 5 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5767) Migrate Parquet data source to the write support of data source API
[ https://issues.apache.org/jira/browse/SPARK-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-5767. --- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4563 [https://github.com/apache/spark/pull/4563] Migrate Parquet data source to the write support of data source API --- Key: SPARK-5767 URL: https://issues.apache.org/jira/browse/SPARK-5767 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: Cheng Lian Assignee: Cheng Lian Fix For: 1.3.0 Migrate to the newly introduced data source write support API (SPARK-5658). Add support for overwriting and appending to existing tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4553) query for parquet table with string fields in spark sql hive get binary result
[ https://issues.apache.org/jira/browse/SPARK-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-4553. --- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 4563 [https://github.com/apache/spark/pull/4563] query for parquet table with string fields in spark sql hive get binary result -- Key: SPARK-4553 URL: https://issues.apache.org/jira/browse/SPARK-4553 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.0 Reporter: Fei Wang Assignee: Cheng Lian Priority: Blocker Fix For: 1.3.0 run create table test_parquet(key int, value string) stored as parquet; insert into table test_parquet select * from src; select * from test_parquet; get result as follow ... 282 [B@38fda3b 138 [B@1407a24 238 [B@12de6fb 419 [B@6c97695 15 [B@4885067 118 [B@156a8d3 72 [B@65d20dd 90 [B@4c18906 307 [B@60b24cc 19 [B@59cf51b 435 [B@39fdf37 10 [B@4f799d7 277 [B@3950951 273 [B@596bf4b 306 [B@3e91557 224 [B@3781d61 309 [B@2d0d128 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5834) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old
[ https://issues.apache.org/jira/browse/SPARK-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-5834: -- Description: I see spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties It indicates that officical package only use httpclient 4.1.2. some spark module requires httpclient 4.2 and above. https://github.com/apache/spark/pull/2489/files ( commons.httpclient.version4.2/commons.httpclient.version) https://github.com/apache/spark/pull/2535/files (commons.httpclient.version4.2.6/commons.httpclient.version) I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. was: I see spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties It indicates that officical package only use httpclient 4.1.2. some spark module required httpclient 4.2 and above. https://github.com/apache/spark/pull/2489/files ( commons.httpclient.version4.2/commons.httpclient.version) https://github.com/apache/spark/pull/2535/files (commons.httpclient.version4.2.6/commons.httpclient.version) I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old -- Key: SPARK-5834 URL: https://issues.apache.org/jira/browse/SPARK-5834 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor I see spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties It indicates that officical package only use httpclient 4.1.2. some spark module requires httpclient 4.2 and above. https://github.com/apache/spark/pull/2489/files ( commons.httpclient.version4.2/commons.httpclient.version) https://github.com/apache/spark/pull/2535/files (commons.httpclient.version4.2.6/commons.httpclient.version) I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5833) Adds REFRESH TABLE command to refresh external data sources tables
[ https://issues.apache.org/jira/browse/SPARK-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-5833: -- Description: This command can be used to refresh (possibly cached) metadata stored in external data source tables. For example, for JSON tables, it forces schema inference; for Parquet tables, it forces schema merging and partition discovery. Adds REFRESH TABLE command to refresh external data sources tables -- Key: SPARK-5833 URL: https://issues.apache.org/jira/browse/SPARK-5833 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: Cheng Lian This command can be used to refresh (possibly cached) metadata stored in external data source tables. For example, for JSON tables, it forces schema inference; for Parquet tables, it forces schema merging and partition discovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5745) Allow to use custom TaskMetrics implementation
[ https://issues.apache.org/jira/browse/SPARK-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322594#comment-14322594 ] Jacek Lewandowski commented on SPARK-5745: -- Thanks [~pwendell] for your reply. The primary goal is to associate with the task some additional data which can be collected by some driver-side listener afterwards. The data which I'd like to collect is not accessible to the user directly - say, I want to collect the number of rows fetched from the database, or the number of batches written to the database. These values are known inside the job code and can be easily reported to task metrics (just like the number of read/written bytes are reported now). If I understand the idea of accumulators correctly, although the accumulator is a great feature for application-specific metrics, I don't really know how to use them to collect metrics which are more general - like RDD / job execution metrics, which are a part of an intermediate framework or a library. Allow to use custom TaskMetrics implementation -- Key: SPARK-5745 URL: https://issues.apache.org/jira/browse/SPARK-5745 Project: Spark Issue Type: Wish Components: Spark Core Reporter: Jacek Lewandowski There can be various RDDs implemented and the {{TaskMetrics}} provides a great API for collecting metrics and aggregating them. However some RDDs may want to register some custom metrics and the current implementation doesn't allow for this (for example the number of read rows or whatever). I suppose that this can be changed without modifying the whole interface - there could used some factory to create the initial {{TaskMetrics}} object. The default factory could be overridden by user. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5813) Spark-ec2: Switch to OracleJDK
[ https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322611#comment-14322611 ] Florian Verhein commented on SPARK-5813: I think it's a good idea to stick to vendor recommendations, but since I can't point to any concrete benefits and there is complexity around handling licensing issues, I don't think there's a good argument for tackling this. Spark-ec2: Switch to OracleJDK -- Key: SPARK-5813 URL: https://issues.apache.org/jira/browse/SPARK-5813 Project: Spark Issue Type: Improvement Components: EC2 Reporter: Florian Verhein Priority: Minor Currently using OpenJDK, however it is generally recommended to use Oracle JDK, esp for Hadoop deployments, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5834) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old
Littlestar created SPARK-5834: - Summary: spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old Key: SPARK-5834 URL: https://issues.apache.org/jira/browse/SPARK-5834 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 1.2.1 Reporter: Littlestar assembly-1.1.1-hadoop2.4.0.jar the class HttpPatch is not there which was introduced in 4.2 I see spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties It indicates that officical package only use httpclient 4.1.2. some spark module required httpclient 4.2 and above. https://github.com/apache/spark/pull/2489/files ( commons.httpclient.version4.2/commons.httpclient.version) https://github.com/apache/spark/pull/2535/files (commons.httpclient.version4.2.6/commons.httpclient.version) I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5834) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old
[ https://issues.apache.org/jira/browse/SPARK-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Littlestar updated SPARK-5834: -- Description: I see spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties It indicates that officical package only use httpclient 4.1.2. some spark module required httpclient 4.2 and above. https://github.com/apache/spark/pull/2489/files ( commons.httpclient.version4.2/commons.httpclient.version) https://github.com/apache/spark/pull/2535/files (commons.httpclient.version4.2.6/commons.httpclient.version) I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. was: assembly-1.1.1-hadoop2.4.0.jar the class HttpPatch is not there which was introduced in 4.2 I see spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties It indicates that officical package only use httpclient 4.1.2. some spark module required httpclient 4.2 and above. https://github.com/apache/spark/pull/2489/files ( commons.httpclient.version4.2/commons.httpclient.version) https://github.com/apache/spark/pull/2535/files (commons.httpclient.version4.2.6/commons.httpclient.version) I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. Priority: Minor (was: Major) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old -- Key: SPARK-5834 URL: https://issues.apache.org/jira/browse/SPARK-5834 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor I see spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties It indicates that officical package only use httpclient 4.1.2. some spark module required httpclient 4.2 and above. https://github.com/apache/spark/pull/2489/files ( commons.httpclient.version4.2/commons.httpclient.version) https://github.com/apache/spark/pull/2535/files (commons.httpclient.version4.2.6/commons.httpclient.version) I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5833) Adds REFRESH TABLE command to refresh external data sources tables
[ https://issues.apache.org/jira/browse/SPARK-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322595#comment-14322595 ] Apache Spark commented on SPARK-5833: - User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/4624 Adds REFRESH TABLE command to refresh external data sources tables -- Key: SPARK-5833 URL: https://issues.apache.org/jira/browse/SPARK-5833 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: Cheng Lian This command can be used to refresh (possibly cached) metadata stored in external data source tables. For example, for JSON tables, it forces schema inference; for Parquet tables, it forces schema merging and partition discovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-5813) Spark-ec2: Switch to OracleJDK
[ https://issues.apache.org/jira/browse/SPARK-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Florian Verhein closed SPARK-5813. -- Resolution: Won't Fix Spark-ec2: Switch to OracleJDK -- Key: SPARK-5813 URL: https://issues.apache.org/jira/browse/SPARK-5813 Project: Spark Issue Type: Improvement Components: EC2 Reporter: Florian Verhein Priority: Minor Currently using OpenJDK, however it is generally recommended to use Oracle JDK, esp for Hadoop deployments, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322614#comment-14322614 ] Beniamino commented on SPARK-2344: -- Hi everybody, I'm currently working on the Fuzzy C Means implementation too. I Have a first draft of my code here : https://github.com/bdelpizzo/mllib-extension/blob/master/clustering/FCM.scala I'm still working on it. I really will appreciate any suggestions. Thanks Add Fuzzy C-Means algorithm to MLlib Key: SPARK-2344 URL: https://issues.apache.org/jira/browse/SPARK-2344 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Alex Priority: Minor Original Estimate: 1m Remaining Estimate: 1m I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. FCM is very similar to K - Means which is already implemented, and they differ only in the degree of relationship each point has with each cluster: (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. As part of the implementation I would like: - create a base class for K- Means and FCM - implement the relationship for each algorithm differently (in its class) I'd like this to be assigned to me. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5834) spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old
[ https://issues.apache.org/jira/browse/SPARK-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5834. -- Resolution: Not a Problem Spark doesn't actually use HttpClient at all; its dependencies do. You're looking at a dependency update specific to the Kinesis ASL build, which is not enabled in the build you downloaded. You would not depend on Spark's copy of this lib anyway. You depend on the version you need. spark 1.2.1 officical package bundled with httpclient 4.1.2 is too old -- Key: SPARK-5834 URL: https://issues.apache.org/jira/browse/SPARK-5834 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 1.2.1 Reporter: Littlestar Priority: Minor I see spark-1.2.1-bin-hadoop2.4.tgz\spark-1.2.1-bin-hadoop2.4\lib\spark-assembly-1.2.1-hadoop2.4.0.jar\org\apache\http\version.properties It indicates that officical package only use httpclient 4.1.2. some spark module requires httpclient 4.2 and above. https://github.com/apache/spark/pull/2489/files ( commons.httpclient.version4.2/commons.httpclient.version) https://github.com/apache/spark/pull/2535/files (commons.httpclient.version4.2.6/commons.httpclient.version) I think httpclient 4.1.2 is too old, standard distribution may conflict with other httpclient required user app. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2344) Add Fuzzy C-Means algorithm to MLlib
[ https://issues.apache.org/jira/browse/SPARK-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322658#comment-14322658 ] Alex commented on SPARK-2344: - Hi, I'm also working on the implementation of FCM, You can find my work here: https://github.com/salexln/spark/tree/master/mllib/src/main/scala/org/apache/spark/mllib/clustering Alex Add Fuzzy C-Means algorithm to MLlib Key: SPARK-2344 URL: https://issues.apache.org/jira/browse/SPARK-2344 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Alex Priority: Minor Original Estimate: 1m Remaining Estimate: 1m I would like to add a FCM (Fuzzy C-Means) algorithm to MLlib. FCM is very similar to K - Means which is already implemented, and they differ only in the degree of relationship each point has with each cluster: (in FCM the relationship is in a range of [0..1] whether in K - Means its 0/1. As part of the implementation I would like: - create a base class for K- Means and FCM - implement the relationship for each algorithm differently (in its class) I'd like this to be assigned to me. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5829) JavaStreamingContext.fileStream run task loop repeated empty when no more new files found
[ https://issues.apache.org/jira/browse/SPARK-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5829. -- Resolution: Duplicate Same as SPARK-3228 which is WontFix. The behavior is intended. You can actually copy and change the saveAs* functions and change them to get the behavior you want pretty easily. JavaStreamingContext.fileStream run task loop repeated empty when no more new files found -- Key: SPARK-5829 URL: https://issues.apache.org/jira/browse/SPARK-5829 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Environment: spark master (1.3.0) with SPARK-5826 patch. Reporter: Littlestar Priority: Minor spark master (1.3.0) with SPARK-5826 patch. JavaStreamingContext.fileStream run task repeated empty when no more new files reproduce: 1. mkdir /testspark/watchdir on HDFS. 2. run app. 3. put some text files into /testspark/watchdir. every 30 seconds, spark log indicates that a new sub task runs. and /testspark/resultdir/ has new directory with empty files every 30 seconds. when no new files add, but it runs new task with empy rdd. {noformat} package my.test.hadoop.spark; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.spark.SparkConf; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.Function2; import org.apache.spark.api.java.function.PairFunction; import org.apache.spark.streaming.Durations; import org.apache.spark.streaming.api.java.JavaPairDStream; import org.apache.spark.streaming.api.java.JavaStreamingContext; import scala.Tuple2; public class TestStream { @SuppressWarnings({ serial, resource }) public static void main(String[] args) throws Exception { SparkConf conf = new SparkConf().setAppName(TestStream); JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(30)); jssc.checkpoint(/testspark/checkpointdir); Configuration jobConf = new Configuration(); jobConf.set(my.test.fields,fields); JavaPairDStreamInteger, Integer is = jssc.fileStream(/testspark/watchdir, LongWritable.class, Text.class, TextInputFormat.class, new FunctionPath, Boolean() { @Override public Boolean call(Path v1) throws Exception { return true; } }, true, jobConf).mapToPair(new PairFunctionTuple2LongWritable, Text, Integer, Integer() { @Override public Tuple2Integer, Integer call(Tuple2LongWritable, Text arg0) throws Exception { return new Tuple2Integer, Integer(1, 1); } }); JavaPairDStreamInteger, Integer rs = is.reduceByKey(new Function2Integer, Integer, Integer() { @Override public Integer call(Integer arg0, Integer arg1) throws Exception { return arg0 + arg1; } }); rs.checkpoint(Durations.seconds(60)); rs.saveAsNewAPIHadoopFiles(/testspark/resultdir/output, suffix, Integer.class, Integer.class, TextOutputFormat.class); jssc.start(); jssc.awaitTermination(); } } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5835) Unit test causes java.io.FileNotFoundException on localhost for file broadcast_1
sam created SPARK-5835: -- Summary: Unit test causes java.io.FileNotFoundException on localhost for file broadcast_1 Key: SPARK-5835 URL: https://issues.apache.org/jira/browse/SPARK-5835 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: sam Note, I do not believe this is related to SPARK-2984 since I have speculative execution off (it's off by default in 1.0.0). I intermittently get the following stack trace in my unit tests. I'm using specs2 and I have sequential in the tests (so should not be bumping into each other), and also I have `parallelExecution in Test := false` in my `build.sbt`. This isn't a major showstopper, it just means our CI pipelines need some retry logic to workaround the erroring tests. [error] Could not run test my.test.Class: org.apache.spark.SparkException: Job aborted due to stage failure: Task 4.0:0 failed 1 times, most recent failure: Exception failure in TID 6 on host localhost: java.io.FileNotFoundException: http://blar.blar.blar.blar:59528/broadcast_1 [error] sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1834) [error] sun.net.www.protocol.http.HttpURLConnection.access$200(HttpURLConnection.java:90) [error] sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1431) [error] sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1429) [error] java.security.AccessController.doPrivileged(Native Method) [error] java.security.AccessController.doPrivileged(AccessController.java:713) [error] sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1428) [error] org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:196) [error] org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:89) [error] sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) [error] sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [error] java.lang.reflect.Method.invoke(Method.java:483) [error] java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) [error] java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896) [error] java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) [error] java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) [error] java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) [error] java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) [error] java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) [error] java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) [error] java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) [error] java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) [error] java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) [error] java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) [error] java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) [error] java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) [error] java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) [error] java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) [error] java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) [error] java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) [error] java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) [error] java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) [error] java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) [error] scala.collection.immutable.$colon$colon.readObject(List.scala:362) [error] sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) [error] sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [error] java.lang.reflect.Method.invoke(Method.java:483) [error] java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) [error] java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896) [error] java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) [error] java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) [error] java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) [error] java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) [error]