[jira] [Commented] (SPARK-10251) Some internal spark classes are not registered with kryo
[ https://issues.apache.org/jira/browse/SPARK-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738390#comment-14738390 ] Marius Soutier commented on SPARK-10251: Any chance for a backport to 1.4.2? > Some internal spark classes are not registered with kryo > > > Key: SPARK-10251 > URL: https://issues.apache.org/jira/browse/SPARK-10251 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.1 >Reporter: Soren Macbeth >Assignee: Ram Sriharsha > Fix For: 1.6.0 > > > When running a job using kryo serialization and setting > `spark.kryo.registrationRequired=true` some internal classes are not > registered, causing the job to die. This is still a problem when this setting > is false (which is the default) because it makes the space required to store > serialized objects in memory or disk much much more expensive in terms of > runtime and storage space. > {code} > 15/08/25 20:28:21 WARN spark.scheduler.TaskSetManager: Lost task 0.0 in stage > 0.0 (TID 0, a.b.c.d): java.lang.IllegalArgumentException: Class is not > registered: scala.Tuple2[] > Note: To register this class use: kryo.register(scala.Tuple2[].class); > at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79) > at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565) > at > org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:250) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:236) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7600) Stopping Streaming Context (sometimes) crashes master
[ https://issues.apache.org/jira/browse/SPARK-7600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553999#comment-14553999 ] Marius Soutier commented on SPARK-7600: --- Yes it is, that might be the problem indeed. Stopping Streaming Context (sometimes) crashes master - Key: SPARK-7600 URL: https://issues.apache.org/jira/browse/SPARK-7600 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.3.1 Reporter: Marius Soutier In my streaming job (that uses actorStreams) I'm stopping the SparkStreaming context via ssc.stop(stopSparkContext = true, stopGracefully = true). Sometimes this leads to the Spark master being in a permanent error state that just displays an error page instead of the UI. The following is being logged when trying to access the master UI: 15/05/13 15:57:15 WARN jetty.servlet.ServletHandler: / java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.deploy.master.ui.MasterPage.render(MasterPage.scala:47) at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:79) at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:79) at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:69) at javax.servlet.http.HttpServlet.service(HttpServlet.java:735) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.spark-project.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.spark-project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.spark-project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.spark-project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.spark-project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.spark-project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.spark-project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.spark-project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.spark-project.jetty.server.Server.handle(Server.java:370) at org.spark-project.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.spark-project.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.spark-project.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.spark-project.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.spark-project.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.spark-project.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-7600) Stopping Streaming Context (sometimes) crashes master
[ https://issues.apache.org/jira/browse/SPARK-7600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553999#comment-14553999 ] Marius Soutier edited comment on SPARK-7600 at 5/21/15 9:52 AM: Yes it is, that might be the problem indeed. However this particular problem happens when stopping the job. was (Author: msoutier): Yes it is, that might be the problem indeed. Stopping Streaming Context (sometimes) crashes master - Key: SPARK-7600 URL: https://issues.apache.org/jira/browse/SPARK-7600 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.3.1 Reporter: Marius Soutier In my streaming job (that uses actorStreams) I'm stopping the SparkStreaming context via ssc.stop(stopSparkContext = true, stopGracefully = true). Sometimes this leads to the Spark master being in a permanent error state that just displays an error page instead of the UI. The following is being logged when trying to access the master UI: 15/05/13 15:57:15 WARN jetty.servlet.ServletHandler: / java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.deploy.master.ui.MasterPage.render(MasterPage.scala:47) at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:79) at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:79) at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:69) at javax.servlet.http.HttpServlet.service(HttpServlet.java:735) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.spark-project.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.spark-project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.spark-project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.spark-project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.spark-project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.spark-project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.spark-project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.spark-project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.spark-project.jetty.server.Server.handle(Server.java:370) at org.spark-project.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.spark-project.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.spark-project.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.spark-project.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.spark-project.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.spark-project.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7600) Stopping Streaming Context (sometimes) crashes master
Marius Soutier created SPARK-7600: - Summary: Stopping Streaming Context (sometimes) crashes master Key: SPARK-7600 URL: https://issues.apache.org/jira/browse/SPARK-7600 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.3.1 Reporter: Marius Soutier In my streaming job (that uses actorStreams) I'm stopping the SparkStreaming context via ssc.stop(stopSparkContext = true, stopGracefully = true). Sometimes this leads to the Spark master being in a permanent error state that just displays an error page instead of the UI. The following is being logged when trying to access the master UI: 15/05/13 15:57:15 WARN jetty.servlet.ServletHandler: / java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.deploy.master.ui.MasterPage.render(MasterPage.scala:47) at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:79) at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:79) at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:69) at javax.servlet.http.HttpServlet.service(HttpServlet.java:735) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.spark-project.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.spark-project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.spark-project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.spark-project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.spark-project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.spark-project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.spark-project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.spark-project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.spark-project.jetty.server.Server.handle(Server.java:370) at org.spark-project.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.spark-project.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.spark-project.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.spark-project.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.spark-project.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.spark-project.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6613) Starting stream from checkpoint causes Streaming tab to throw error
[ https://issues.apache.org/jira/browse/SPARK-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541951#comment-14541951 ] Marius Soutier commented on SPARK-6613: --- It's still happening with 1.3.1. Starting stream from checkpoint causes Streaming tab to throw error --- Key: SPARK-6613 URL: https://issues.apache.org/jira/browse/SPARK-6613 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1, 1.2.2, 1.3.1 Reporter: Marius Soutier When continuing my streaming job from a checkpoint, the job runs, but the Streaming tab in the standard UI initially no longer works (browser just shows HTTP ERROR: 500). Sometimes it gets back to normal after a while, and sometimes it stays in this state permanently. Stacktrace: WARN org.eclipse.jetty.servlet.ServletHandler: /streaming/ java.util.NoSuchElementException: key not found: 0 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:58) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:151) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:150) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Range.foreach(Range.scala:141) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:150) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:149) at scala.Option.map(Option.scala:145) at org.apache.spark.streaming.ui.StreamingJobProgressListener.lastReceivedBatchRecords(StreamingJobProgressListener.scala:149) at org.apache.spark.streaming.ui.StreamingPage.generateReceiverStats(StreamingPage.scala:82) at org.apache.spark.streaming.ui.StreamingPage.render(StreamingPage.scala:43) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:68) at javax.servlet.http.HttpServlet.service(HttpServlet.java:735) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) -- This message was
[jira] [Updated] (SPARK-6613) Starting stream from checkpoint causes Streaming tab to throw error
[ https://issues.apache.org/jira/browse/SPARK-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marius Soutier updated SPARK-6613: -- Affects Version/s: 1.3.1 Starting stream from checkpoint causes Streaming tab to throw error --- Key: SPARK-6613 URL: https://issues.apache.org/jira/browse/SPARK-6613 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1, 1.2.2, 1.3.1 Reporter: Marius Soutier When continuing my streaming job from a checkpoint, the job runs, but the Streaming tab in the standard UI initially no longer works (browser just shows HTTP ERROR: 500). Sometimes it gets back to normal after a while, and sometimes it stays in this state permanently. Stacktrace: WARN org.eclipse.jetty.servlet.ServletHandler: /streaming/ java.util.NoSuchElementException: key not found: 0 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:58) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:151) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:150) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Range.foreach(Range.scala:141) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:150) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:149) at scala.Option.map(Option.scala:145) at org.apache.spark.streaming.ui.StreamingJobProgressListener.lastReceivedBatchRecords(StreamingJobProgressListener.scala:149) at org.apache.spark.streaming.ui.StreamingPage.generateReceiverStats(StreamingPage.scala:82) at org.apache.spark.streaming.ui.StreamingPage.render(StreamingPage.scala:43) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:68) at javax.servlet.http.HttpServlet.service(HttpServlet.java:735) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532663#comment-14532663 ] Marius Soutier commented on SPARK-3928: --- DataFrames now expect varagrs, i.e. df.parquetFile(/path/to/file/1,/path/to/file/2). Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532704#comment-14532704 ] Marius Soutier commented on SPARK-3928: --- Wildcards were never supported and it seems they don't intend to change that. :( Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532663#comment-14532663 ] Marius Soutier edited comment on SPARK-3928 at 5/7/15 1:54 PM: --- DataFrames now expect varargs, i.e. df.parquetFile(/path/to/file/1,/path/to/file/2). was (Author: msoutier): DataFrames now expect varagrs, i.e. df.parquetFile(/path/to/file/1,/path/to/file/2). Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532969#comment-14532969 ] Marius Soutier commented on SPARK-3928: --- Because parquetFile now takes a varargs parameter which in turn is combined to a single path using mkString(,). This works just as before. The PR you link to still uses the old method with a single String parameter. It probably got lost in translation. Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor Fix For: 1.3.0 {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7167) Receivers are not distributed efficiently
Marius Soutier created SPARK-7167: - Summary: Receivers are not distributed efficiently Key: SPARK-7167 URL: https://issues.apache.org/jira/browse/SPARK-7167 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.2, 1.2.1 Reporter: Marius Soutier Bug report: I'm seeing an issue where after starting a streaming application from a checkpoint, the network receivers are distributed such that not all nodes are used. For example, I have five nodes: node0 - 1 receiver node1 - 2 receivers node2 - 0 receivers node3 - 2 receivers node4 - 0 receivers This slows down the job, waiting batches pile up, and I have to kill and restart it, hoping that next time it will be distributed in a sensible fashion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7167) Receivers are not distributed efficiently when starting from checkpoint
[ https://issues.apache.org/jira/browse/SPARK-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marius Soutier updated SPARK-7167: -- Summary: Receivers are not distributed efficiently when starting from checkpoint (was: Receivers are not distributed efficiently) Receivers are not distributed efficiently when starting from checkpoint --- Key: SPARK-7167 URL: https://issues.apache.org/jira/browse/SPARK-7167 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1, 1.2.2 Reporter: Marius Soutier Bug report: I'm seeing an issue where after starting a streaming application from a checkpoint, the network receivers are distributed such that not all nodes are used. For example, I have five nodes: node0 - 1 receiver node1 - 2 receivers node2 - 0 receivers node3 - 2 receivers node4 - 0 receivers This slows down the job, waiting batches pile up, and I have to kill and restart it, hoping that next time it will be distributed in a sensible fashion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6613) Starting stream from checkpoint causes Streaming tab to throw error
[ https://issues.apache.org/jira/browse/SPARK-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marius Soutier updated SPARK-6613: -- Affects Version/s: 1.2.2 Starting stream from checkpoint causes Streaming tab to throw error --- Key: SPARK-6613 URL: https://issues.apache.org/jira/browse/SPARK-6613 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1, 1.2.2 Reporter: Marius Soutier When continuing my streaming job from a checkpoint, the job runs, but the Streaming tab in the standard UI initially no longer works (browser just shows HTTP ERROR: 500). Sometimes it gets back to normal after a while, and sometimes it stays in this state permanently. Stacktrace: WARN org.eclipse.jetty.servlet.ServletHandler: /streaming/ java.util.NoSuchElementException: key not found: 0 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:58) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:151) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:150) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Range.foreach(Range.scala:141) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:150) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:149) at scala.Option.map(Option.scala:145) at org.apache.spark.streaming.ui.StreamingJobProgressListener.lastReceivedBatchRecords(StreamingJobProgressListener.scala:149) at org.apache.spark.streaming.ui.StreamingPage.generateReceiverStats(StreamingPage.scala:82) at org.apache.spark.streaming.ui.StreamingPage.render(StreamingPage.scala:43) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:68) at javax.servlet.http.HttpServlet.service(HttpServlet.java:735) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (SPARK-7167) Receivers are not distributed efficiently when starting from checkpoint
[ https://issues.apache.org/jira/browse/SPARK-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513976#comment-14513976 ] Marius Soutier edited comment on SPARK-7167 at 4/27/15 12:09 PM: - Maybe the slowdown is only incidental, though it's odd at a batch interval of 1 minute and 40-50 records per interval. In my case I have an actor system running on each worker node that receives data and forwards it to a registered actor receiver (ssc.actorStream(...)) to this results in additional network traffic, but that should not be a problem at 10Gbit. (I'm also aware that actorStream is not really a production-ready feature.) But in any case, from the documentation: For example, a single Kafka input DStream receiving two topics of data can be split into two Kafka input streams, each receiving only one topic. This would run two receivers on two workers [...] So receivers should be distributed equally on the cluster, and this appears to be a bug. I also noticed the receivers get redistributed all the time. was (Author: msoutier): Maybe the slowdown is only incidental, though it's odd at a batch interval of 1 minute and 40-50 records per interval. In my case I have an actor system running on each worker node that receives data and forwards it to a registered actor receiver (ssc.actorStream(...)) to this results in additional network traffic, but that should not be a problem at 10Gbit. (I'm also aware that actorStream is not really a production-ready feature.) But in any case, from the documentation: For example, a single Kafka input DStream receiving two topics of data can be split into two Kafka input streams, each receiving only one topic. This would run two receivers on two workers [...] So receivers should be distributed equally on the cluster, and this appears to be a bug. Receivers are not distributed efficiently when starting from checkpoint --- Key: SPARK-7167 URL: https://issues.apache.org/jira/browse/SPARK-7167 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1, 1.2.2 Reporter: Marius Soutier Priority: Minor Bug report: I'm seeing an issue where after starting a streaming application from a checkpoint, the network receivers are distributed such that not all nodes are used. For example, I have five nodes: node0 - 1 receiver node1 - 2 receivers node2 - 0 receivers node3 - 2 receivers node4 - 0 receivers This slows down the job, waiting batches pile up, and I have to kill and restart it, hoping that next time it will be distributed in a sensible fashion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7167) Receivers are not distributed efficiently when starting from checkpoint
[ https://issues.apache.org/jira/browse/SPARK-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513976#comment-14513976 ] Marius Soutier commented on SPARK-7167: --- Maybe the slowdown is only incidental, though it's odd at a batch interval of 1 minute and 40-50 records per interval. In my case I have an actor system running on each worker node that receives data and forwards it to a registered actor receiver (ssc.actorStream(...)) to this results in additional network traffic, but that should not be a problem at 10Gbit. (I'm also aware that actorStream is not really a production-ready feature.) But in any case, from the documentation: For example, a single Kafka input DStream receiving two topics of data can be split into two Kafka input streams, each receiving only one topic. This would run two receivers on two workers [...] So receivers should be distributed equally on the cluster, and this appears to be a bug. Receivers are not distributed efficiently when starting from checkpoint --- Key: SPARK-7167 URL: https://issues.apache.org/jira/browse/SPARK-7167 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1, 1.2.2 Reporter: Marius Soutier Priority: Minor Bug report: I'm seeing an issue where after starting a streaming application from a checkpoint, the network receivers are distributed such that not all nodes are used. For example, I have five nodes: node0 - 1 receiver node1 - 2 receivers node2 - 0 receivers node3 - 2 receivers node4 - 0 receivers This slows down the job, waiting batches pile up, and I have to kill and restart it, hoping that next time it will be distributed in a sensible fashion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7167) Receivers are not distributed efficiently when starting from checkpoint
[ https://issues.apache.org/jira/browse/SPARK-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marius Soutier updated SPARK-7167: -- Attachment: Screen Shot 2015-04-27 at 14.10.05.jpg Receivers are not distributed efficiently when starting from checkpoint --- Key: SPARK-7167 URL: https://issues.apache.org/jira/browse/SPARK-7167 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1, 1.2.2 Reporter: Marius Soutier Priority: Minor Attachments: Screen Shot 2015-04-27 at 14.10.05.jpg Bug report: I'm seeing an issue where after starting a streaming application from a checkpoint, the network receivers are distributed such that not all nodes are used. For example, I have five nodes: node0 - 1 receiver node1 - 2 receivers node2 - 0 receivers node3 - 2 receivers node4 - 0 receivers This slows down the job, waiting batches pile up, and I have to kill and restart it, hoping that next time it will be distributed in a sensible fashion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7028) Add filterNot to RDD
[ https://issues.apache.org/jira/browse/SPARK-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marius Soutier updated SPARK-7028: -- Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) Add filterNot to RDD Key: SPARK-7028 URL: https://issues.apache.org/jira/browse/SPARK-7028 Project: Spark Issue Type: Improvement Reporter: Marius Soutier Priority: Minor The Scala collection APIs have not only `filter`, but also `filterNot` for convenience and readability. I'd suggest to add the same to RDD. I can submit a PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7028) Add filterNot to RDD
Marius Soutier created SPARK-7028: - Summary: Add filterNot to RDD Key: SPARK-7028 URL: https://issues.apache.org/jira/browse/SPARK-7028 Project: Spark Issue Type: Bug Reporter: Marius Soutier The Scala collection APIs have not only `filter`, but also `filterNot` for convenience and readability. I'd suggest to add the same to RDD. I can submit a PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6613) Starting stream from checkpoint causes Streaming tab to throw error
[ https://issues.apache.org/jira/browse/SPARK-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392234#comment-14392234 ] Marius Soutier commented on SPARK-6613: --- It's combination of actorStreams and StreamingContext.getOrCreate(). I've started to update the actorStream example from spark-examples, but it will take some more time to complete it. I'll post the code here. Starting stream from checkpoint causes Streaming tab to throw error --- Key: SPARK-6613 URL: https://issues.apache.org/jira/browse/SPARK-6613 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Marius Soutier When continuing my streaming job from a checkpoint, the job runs, but the Streaming tab in the standard UI initially no longer works (browser just shows HTTP ERROR: 500). Sometimes it gets back to normal after a while, and sometimes it stays in this state permanently. Stacktrace: WARN org.eclipse.jetty.servlet.ServletHandler: /streaming/ java.util.NoSuchElementException: key not found: 0 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:58) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:151) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:150) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Range.foreach(Range.scala:141) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:150) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:149) at scala.Option.map(Option.scala:145) at org.apache.spark.streaming.ui.StreamingJobProgressListener.lastReceivedBatchRecords(StreamingJobProgressListener.scala:149) at org.apache.spark.streaming.ui.StreamingPage.generateReceiverStats(StreamingPage.scala:82) at org.apache.spark.streaming.ui.StreamingPage.render(StreamingPage.scala:43) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:68) at javax.servlet.http.HttpServlet.service(HttpServlet.java:735) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at
[jira] [Commented] (SPARK-6613) Starting stream from checkpoint causes Streaming tab to throw error
[ https://issues.apache.org/jira/browse/SPARK-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390423#comment-14390423 ] Marius Soutier commented on SPARK-6613: --- Bug report. Starting stream from checkpoint causes Streaming tab to throw error --- Key: SPARK-6613 URL: https://issues.apache.org/jira/browse/SPARK-6613 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Marius Soutier When continuing my streaming job from a checkpoint, the job runs, but the Streaming tab in the standard UI initially no longer works (browser just shows HTTP ERROR: 500). Sometimes it gets back to normal after a while, and sometimes it stays in this state permanently. Stacktrace: WARN org.eclipse.jetty.servlet.ServletHandler: /streaming/ java.util.NoSuchElementException: key not found: 0 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:58) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:151) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:150) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Range.foreach(Range.scala:141) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:150) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:149) at scala.Option.map(Option.scala:145) at org.apache.spark.streaming.ui.StreamingJobProgressListener.lastReceivedBatchRecords(StreamingJobProgressListener.scala:149) at org.apache.spark.streaming.ui.StreamingPage.generateReceiverStats(StreamingPage.scala:82) at org.apache.spark.streaming.ui.StreamingPage.render(StreamingPage.scala:43) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:68) at javax.servlet.http.HttpServlet.service(HttpServlet.java:735) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA
[jira] [Created] (SPARK-6648) Reading Parquet files with different sub-files doesn't work
Marius Soutier created SPARK-6648: - Summary: Reading Parquet files with different sub-files doesn't work Key: SPARK-6648 URL: https://issues.apache.org/jira/browse/SPARK-6648 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.1 Reporter: Marius Soutier When reading from multiple parquet files (via sqlContext.parquetFile(/path/1.parquet,/path/2.parquet), if the parquet files were created using a different coalesce, the reading fails with: ERROR c.w.r.websocket.ParquetReader efault-dispatcher-63 : Failed reading parquet file java.lang.IllegalArgumentException: Could not find Parquet metadata at path path at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at scala.Option.getOrElse(Option.scala:120) ~[org.scala-lang.scala-library-2.10.4.jar:na] at org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.scala:458) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:477) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetRelation.init(ParquetRelation.scala:65) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:165) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] I haven't tested with Spark 1.3 yet but will report back after upgrading to 1.3.1 (as soon as it's released). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6648) Reading Parquet files with different sub-files doesn't work
[ https://issues.apache.org/jira/browse/SPARK-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marius Soutier updated SPARK-6648: -- Description: When reading from multiple parquet files (via sqlContext.parquetFile(/path/1.parquet,/path/2.parquet), and one of the parquet files is being overwritten using a different coalesce (e.g. one only contains part-r-1.parquet, the other also part-r-2.parquet, part-r-3.parquet), the reading fails with: ERROR c.w.r.websocket.ParquetReader efault-dispatcher-63 : Failed reading parquet file java.lang.IllegalArgumentException: Could not find Parquet metadata at path path at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at scala.Option.getOrElse(Option.scala:120) ~[org.scala-lang.scala-library-2.10.4.jar:na] at org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.scala:458) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:477) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetRelation.init(ParquetRelation.scala:65) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:165) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] I haven't tested with Spark 1.3 yet but will report back after upgrading to 1.3.1 (as soon as it's released). was: When reading from multiple parquet files (via sqlContext.parquetFile(/path/1.parquet,/path/2.parquet), if the parquet files were created using a different coalesce (e.g. one only contains part-r-1.parquet, the other also part-r-2.parquet, part-r-3.parquet), the reading fails with: ERROR c.w.r.websocket.ParquetReader efault-dispatcher-63 : Failed reading parquet file java.lang.IllegalArgumentException: Could not find Parquet metadata at path path at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at scala.Option.getOrElse(Option.scala:120) ~[org.scala-lang.scala-library-2.10.4.jar:na] at org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.scala:458) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:477) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetRelation.init(ParquetRelation.scala:65) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:165) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] I haven't tested with Spark 1.3 yet but will report back after upgrading to 1.3.1 (as soon as it's released). Reading Parquet files with different sub-files doesn't work --- Key: SPARK-6648 URL: https://issues.apache.org/jira/browse/SPARK-6648 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.1 Reporter: Marius Soutier When reading from multiple parquet files (via sqlContext.parquetFile(/path/1.parquet,/path/2.parquet), and one of the parquet files is being overwritten using a different coalesce (e.g. one only contains part-r-1.parquet, the other also part-r-2.parquet, part-r-3.parquet), the reading fails with: ERROR c.w.r.websocket.ParquetReader efault-dispatcher-63 : Failed reading parquet file java.lang.IllegalArgumentException: Could not find Parquet metadata at path path at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at scala.Option.getOrElse(Option.scala:120) ~[org.scala-lang.scala-library-2.10.4.jar:na] at org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.scala:458) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:477)
[jira] [Updated] (SPARK-6613) Starting stream from checkpoint causes Streaming tab to throw error
[ https://issues.apache.org/jira/browse/SPARK-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marius Soutier updated SPARK-6613: -- Description: When continuing my streaming job from a checkpoint, the job runs, but the Streaming tab in the standard UI initially no longer works (browser just shows HTTP ERROR: 500). Sometimes it gets back to normal after a while, and sometimes it stays in this state permanently. Stacktrace: WARN org.eclipse.jetty.servlet.ServletHandler: /streaming/ java.util.NoSuchElementException: key not found: 0 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:58) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:151) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:150) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Range.foreach(Range.scala:141) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:150) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:149) at scala.Option.map(Option.scala:145) at org.apache.spark.streaming.ui.StreamingJobProgressListener.lastReceivedBatchRecords(StreamingJobProgressListener.scala:149) at org.apache.spark.streaming.ui.StreamingPage.generateReceiverStats(StreamingPage.scala:82) at org.apache.spark.streaming.ui.StreamingPage.render(StreamingPage.scala:43) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:68) at javax.servlet.http.HttpServlet.service(HttpServlet.java:735) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) was: When continuing my streaming job from a checkpoint, the job runs, but the Streaming tab in the standard UI initially no longer works (browser just shows HTTP ERROR: 500). After a while, it gets back to normal, at least most of the time (sometimes it doesn't work at all, but that's rare). Stacktrace: WARN org.eclipse.jetty.servlet.ServletHandler: /streaming/ java.util.NoSuchElementException: key not found: 0 at scala.collection.MapLike$class.default(MapLike.scala:228)
[jira] [Updated] (SPARK-6613) Starting stream from checkpoint causes Streaming tab to throw error
[ https://issues.apache.org/jira/browse/SPARK-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marius Soutier updated SPARK-6613: -- Description: When continuing my streaming job from a checkpoint, the job runs, but the Streaming tab in the standard UI initially no longer works (browser just shows HTTP ERROR: 500). After a while, it gets back to normal, at least most of the time (sometimes it doesn't work at all, but that's rare). Stacktrace: WARN org.eclipse.jetty.servlet.ServletHandler: /streaming/ java.util.NoSuchElementException: key not found: 0 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:58) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:151) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:150) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Range.foreach(Range.scala:141) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:150) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:149) at scala.Option.map(Option.scala:145) at org.apache.spark.streaming.ui.StreamingJobProgressListener.lastReceivedBatchRecords(StreamingJobProgressListener.scala:149) at org.apache.spark.streaming.ui.StreamingPage.generateReceiverStats(StreamingPage.scala:82) at org.apache.spark.streaming.ui.StreamingPage.render(StreamingPage.scala:43) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:68) at javax.servlet.http.HttpServlet.service(HttpServlet.java:735) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) was: When continuing my streaming job from a checkpoint, the job runs, but the Streaming tab in the standard UI no longer works (browser just shows HTTP ERROR: 500). Stacktrace: WARN org.eclipse.jetty.servlet.ServletHandler: /streaming/ java.util.NoSuchElementException: key not found: 0 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at
[jira] [Created] (SPARK-6613) Starting stream from checkpoint causes Streaming tab to throw error
Marius Soutier created SPARK-6613: - Summary: Starting stream from checkpoint causes Streaming tab to throw error Key: SPARK-6613 URL: https://issues.apache.org/jira/browse/SPARK-6613 Project: Spark Issue Type: Bug Affects Versions: 1.2.1 Reporter: Marius Soutier When continuing my streaming job from a checkpoint, it works, but the Streaming tab in the standard UI no longer works (browser just shows HTTP ERROR: 500). Stacktrace: WARN org.eclipse.jetty.servlet.ServletHandler: /streaming/ java.util.NoSuchElementException: key not found: 0 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:58) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:151) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:150) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Range.foreach(Range.scala:141) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:150) at org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:149) at scala.Option.map(Option.scala:145) at org.apache.spark.streaming.ui.StreamingJobProgressListener.lastReceivedBatchRecords(StreamingJobProgressListener.scala:149) at org.apache.spark.streaming.ui.StreamingPage.generateReceiverStats(StreamingPage.scala:82) at org.apache.spark.streaming.ui.StreamingPage.render(StreamingPage.scala:43) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:68) at javax.servlet.http.HttpServlet.service(HttpServlet.java:735) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6504) Cannot read Parquet files generated from different versions at once
[ https://issues.apache.org/jira/browse/SPARK-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382073#comment-14382073 ] Marius Soutier commented on SPARK-6504: --- Not easily, but 1.3.1 is supposed to be released soon, right? Cannot read Parquet files generated from different versions at once --- Key: SPARK-6504 URL: https://issues.apache.org/jira/browse/SPARK-6504 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.1 Reporter: Marius Soutier When trying to read Parquet files generated by Spark 1.1.1 and 1.2.1 at the same time via `sqlContext.parquetFile(fileFrom1.1.parqut,fileFrom1.2.parquet)` an exception occurs: could not merge metadata: key org.apache.spark.sql.parquet.row.metadata has conflicting values: [{type:struct,fields:[{name:date,type:string,nullable:true,metadata:{}},{name:account,type:string,nullable:true,metadata:{}},{name:impressions,type:long,nullable:false,metadata:{}},{name:cost,type:double,nullable:false,metadata:{}},{name:clicks,type:long,nullable:false,metadata:{}},{name:conversions,type:long,nullable:false,metadata:{}},{name:orderValue,type:double,nullable:false,metadata:{}}]}, StructType(List(StructField(date,StringType,true), StructField(account,StringType,true), StructField(impressions,LongType,false), StructField(cost,DoubleType,false), StructField(clicks,LongType,false), StructField(conversions,LongType,false), StructField(orderValue,DoubleType,false)))] The Schema is exactly equal. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6504) Cannot read Parquet files generated from different versions at once
[ https://issues.apache.org/jira/browse/SPARK-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382027#comment-14382027 ] Marius Soutier commented on SPARK-6504: --- No, as far as I understand, Spark 1.3 cannot read Parquets created with 1.1.x at all. Cannot read Parquet files generated from different versions at once --- Key: SPARK-6504 URL: https://issues.apache.org/jira/browse/SPARK-6504 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.1 Reporter: Marius Soutier When trying to read Parquet files generated by Spark 1.1.1 and 1.2.1 at the same time via `sqlContext.parquetFile(fileFrom1.1.parqut,fileFrom1.2.parquet)` an exception occurs: could not merge metadata: key org.apache.spark.sql.parquet.row.metadata has conflicting values: [{type:struct,fields:[{name:date,type:string,nullable:true,metadata:{}},{name:account,type:string,nullable:true,metadata:{}},{name:impressions,type:long,nullable:false,metadata:{}},{name:cost,type:double,nullable:false,metadata:{}},{name:clicks,type:long,nullable:false,metadata:{}},{name:conversions,type:long,nullable:false,metadata:{}},{name:orderValue,type:double,nullable:false,metadata:{}}]}, StructType(List(StructField(date,StringType,true), StructField(account,StringType,true), StructField(impressions,LongType,false), StructField(cost,DoubleType,false), StructField(clicks,LongType,false), StructField(conversions,LongType,false), StructField(orderValue,DoubleType,false)))] The Schema is exactly equal. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6504) Cannot read Parquet files generated from different versions at once
Marius Soutier created SPARK-6504: - Summary: Cannot read Parquet files generated from different versions at once Key: SPARK-6504 URL: https://issues.apache.org/jira/browse/SPARK-6504 Project: Spark Issue Type: Bug Affects Versions: 1.2.1 Reporter: Marius Soutier When trying to read Parquet files generated by Spark 1.1.1 and 1.2.1 at the same time via `sqlContext.parquetFile(fileFrom1.1.parqut,fileFrom1.2.parquet)` an exception occurs: could not merge metadata: key org.apache.spark.sql.parquet.row.metadata has conflicting values: [{type:struct,fields:[{name:date,type:string,nullable:true,metadata:{}},{name:account,type:string,nullable:true,metadata:{}},{name:impressions,type:long,nullable:false,metadata:{}},{name:cost,type:double,nullable:false,metadata:{}},{name:clicks,type:long,nullable:false,metadata:{}},{name:conversions,type:long,nullable:false,metadata:{}},{name:orderValue,type:double,nullable:false,metadata:{}}]}, StructType(List(StructField(date,StringType,true), StructField(account,StringType,true), StructField(impressions,LongType,false), StructField(cost,DoubleType,false), StructField(clicks,LongType,false), StructField(conversions,LongType,false), StructField(orderValue,DoubleType,false)))] The Schema is exactly equal. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6304) Checkpointing doesn't retain driver port
[ https://issues.apache.org/jira/browse/SPARK-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364713#comment-14364713 ] Marius Soutier edited comment on SPARK-6304 at 3/17/15 7:36 AM: Got it, thanks. In my tests it was never set automatically, so this must be set at some later point. was (Author: msoutier): Got it, thanks. Checkpointing doesn't retain driver port Key: SPARK-6304 URL: https://issues.apache.org/jira/browse/SPARK-6304 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Marius Soutier In a check-pointed Streaming application running on a fixed driver port, the setting spark.driver.port is not loaded when recovering from a checkpoint. (The driver is then started on a random port.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6304) Checkpointing doesn't retain driver port
[ https://issues.apache.org/jira/browse/SPARK-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364713#comment-14364713 ] Marius Soutier commented on SPARK-6304: --- Got it, thanks. Checkpointing doesn't retain driver port Key: SPARK-6304 URL: https://issues.apache.org/jira/browse/SPARK-6304 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Marius Soutier In a check-pointed Streaming application running on a fixed driver port, the setting spark.driver.port is not loaded when recovering from a checkpoint. (The driver is then started on a random port.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6304) Checkpointing doesn't retain driver port
[ https://issues.apache.org/jira/browse/SPARK-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363047#comment-14363047 ] Marius Soutier commented on SPARK-6304: --- Yeah but if the user doesn't set the port, why remove it? When Spark deserializes the checkpoint, the port shouldn't be set by default, right? Checkpointing doesn't retain driver port Key: SPARK-6304 URL: https://issues.apache.org/jira/browse/SPARK-6304 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Marius Soutier In a check-pointed Streaming application running on a fixed driver port, the setting spark.driver.port is not loaded when recovering from a checkpoint. (The driver is then started on a random port.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6304) Checkpointing doesn't retain driver port
[ https://issues.apache.org/jira/browse/SPARK-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362928#comment-14362928 ] Marius Soutier commented on SPARK-6304: --- I'm just reporting the bug. As you said, the code explicitly removes spark.driver.host and spark.driver.port when recovering from a checkpoint, so I first would like to understand why that is. Checkpointing doesn't retain driver port Key: SPARK-6304 URL: https://issues.apache.org/jira/browse/SPARK-6304 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Marius Soutier In a check-pointed Streaming application running on a fixed driver port, the setting spark.driver.port is not loaded when recovering from a checkpoint. (The driver is then started on a random port.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6304) Checkpointing doesn't retain driver port
[ https://issues.apache.org/jira/browse/SPARK-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362918#comment-14362918 ] Marius Soutier commented on SPARK-6304: --- Simple, I'm using `actorStream` and want to send data to it via remoting. For that I need to have a fixed port to send data to. As a workaround I'm now starting a second ActorSystem, but it seems to have issues communicating with Spark's ActorSystem. Checkpointing doesn't retain driver port Key: SPARK-6304 URL: https://issues.apache.org/jira/browse/SPARK-6304 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Marius Soutier In a check-pointed Streaming application running on a fixed driver port, the setting spark.driver.port is not loaded when recovering from a checkpoint. (The driver is then started on a random port.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6304) Checkpointing doesn't retain driver port
Marius Soutier created SPARK-6304: - Summary: Checkpointing doesn't retain driver port Key: SPARK-6304 URL: https://issues.apache.org/jira/browse/SPARK-6304 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Marius Soutier In a check-pointed Streaming application running on a fixed driver port, the setting spark.driver.port is not loaded when recovering from checkpoint. (The driver is then started on a random port.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6304) Checkpointing doesn't retain driver port
[ https://issues.apache.org/jira/browse/SPARK-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marius Soutier updated SPARK-6304: -- Description: In a check-pointed Streaming application running on a fixed driver port, the setting spark.driver.port is not loaded when recovering from a checkpoint. (The driver is then started on a random port.) was: In a check-pointed Streaming application running on a fixed driver port, the setting spark.driver.port is not loaded when recovering from checkpoint. (The driver is then started on a random port.) Checkpointing doesn't retain driver port Key: SPARK-6304 URL: https://issues.apache.org/jira/browse/SPARK-6304 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.2.1 Reporter: Marius Soutier In a check-pointed Streaming application running on a fixed driver port, the setting spark.driver.port is not loaded when recovering from a checkpoint. (The driver is then started on a random port.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3928) Support wildcard matches on Parquet files
[ https://issues.apache.org/jira/browse/SPARK-3928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181185#comment-14181185 ] Marius Soutier commented on SPARK-3928: --- This would be more than nice. Currently, `parquetFile()` supports comma-separated input, but this fails when one of those inputs is not available. A wildcard should solve that and make the API more consistent with other input methods. Support wildcard matches on Parquet files - Key: SPARK-3928 URL: https://issues.apache.org/jira/browse/SPARK-3928 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Reporter: Nicholas Chammas Priority: Minor {{SparkContext.textFile()}} supports patterns like {{part-*}} and {{2014-\?\?-\?\?}}. It would be nice if {{SparkContext.parquetFile()}} did the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org