Where are logs for Spark Kafka Yarn on Cloudera
Hello all, I am trying to test JavaKafkaWordCount on Yarn, to make sure Yarn is working fine I am saving the output to hdfs. The example works fine in local mode but not on yarn mode. I cannot see any output logged when I changed the mode to yarn-client or yarn-cluster or cannot find any errors logged. For my application id I was looking for logs under /var/log/hadoop-yarn/containers (e.g /var/log/hadoop-yarn/containers/application_1439517792099_0010/container_1439517792099_0010_01_03/stderr) but I cannot find anything useful information. Is it the only location where application logs are logged. Also tried setting log output to spark.yarn.app.container.log.dir but got access denied error. Question: Do we need to have some special setup to run spark streaming on Yarn? How do we debug? Where to find more details to test streaming on Yarn. Thanks, Rachana
Hive permanent functions are not available in Spark SQL
Hi, I am trying to use internal UDFs that we have added as permanent functions to Hive, from within Spark SQL query (using HiveContext), but i encounter NoSuchObjectException, i.e. the function could not be found. However, if i execute 'show functions' command in spark SQL, the permanent functions appear in the list. I am using Spark 1.4.1 with Hive 0.13.1. I tried to debug this by looking at the log and code, but it seems both the show functions command as well as udf query both go through essentially the same code path, but the former can see the UDF but the latter can't. Any ideas on how to debug/fix this? Thanks, pala
Spark Network Module behaviour
Dear All, I am trying to understand how exactly spark network module works. Looking at Netty package, I would like to intercept every server response for block fetch. As I understood the place which is responsible for sending remote blocks is "TransportRequestHandler.processFetchRequest". Im trying to distinguish these flows from other flows in a switch using the information I get from "channel" object in this function. These information are source and destination IP and source and destination PORT. I get all these four numbers from the channel object. The problem rises when I'm looking at the switch log and I see extensive traffic going back and forth, which do not belong to these flows I'm capturing. Now my question is, I am missing other places, which are responsible for huge network traffic? Is the port and IP I get from channel is not the correct one? Thank, Saman -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Network-Module-behaviour-tp14402.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Dynamic DAG use-case for spark streaming.
Hi, We are using spark streaming as our processing engine, and as part of output we want to push the data to UI. Now there would be multiple users accessing the system with there different filters on. Based on the filters and other inputs we want to either run a SQL Query on DStream or do a custom logic processing. This would need the system to read the filters/query and generate the execution graph at runtime. I cant see any support in spark streaming for generating the execution graph on the fly. I think I can broadcast the query to executors and read the broadcasted query at runtime but that would also limit my user to 1 at a time. Do we not expect the spark streaming to take queries/filters from outside world. Does output in spark streaming only means outputting to an external source which could then be queried. Thanks, Archit Thakur.
Too many executors are created
Dear Spark developers, I have created a simple Spark application for spark submit. It calls a machine learning library from Spark MLlib that is executed in a number of iterations that correspond to the same number of task in Spark. It seems that Spark creates an executor for each task and then removes it. The following messages indicate this in my log: 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-/24463 is now RUNNING 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-/24463 is now EXITED (Command exited with code 1) 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Executor app-20150929120924-/24463 removed: Command exited with code 1 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 24463 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor added: app-20150929120924-/24464 on worker-20150929120330-16.111.35.101-46374 (16.111.35.101:46374) with 12 cores 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150929120924-/24464 on hostPort 16.111.35.101:46374 with 12 cores, 30.0 GB RAM 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-/24464 is now LOADING 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-/24464 is now RUNNING 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-/24464 is now EXITED (Command exited with code 1) 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Executor app-20150929120924-/24464 removed: Command exited with code 1 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 24464 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor added: app-20150929120924-/24465 on worker-20150929120330-16.111.35.101-46374 (16.111.35.101:46374) with 12 cores 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150929120924-/24465 on hostPort 16.111.35.101:46374 with 12 cores, 30.0 GB RAM 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-/24465 is now LOADING 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-/24465 is now EXITED (Command exited with code 1) 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Executor app-20150929120924-/24465 removed: Command exited with code 1 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 24465 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor added: app-20150929120924-/24466 on worker-20150929120330-16.111.35.101-46374 (16.111.35.101:46374) with 12 cores 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150929120924-/24466 on hostPort 16.111.35.101:46374 with 12 cores, 30.0 GB RAM 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-/24466 is now LOADING 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-/24466 is now RUNNING It end up creating and removing thousands of executors. Is this a normal behavior? If I run the same code within spark-shell, this does not happen. Could you suggest what might be wrong in my setting? Best regards, Alexander
Re: Dynamic DAG use-case for spark streaming.
A very basic support that is there in DStream is DStream.transform() which take arbitrary RDD => RDD function. This function can actually choose to do different computation with time. That may be of help to you. On Tue, Sep 29, 2015 at 12:06 PM, Archit Thakurwrote: > Hi, > > We are using spark streaming as our processing engine, and as part of > output we want to push the data to UI. Now there would be multiple users > accessing the system with there different filters on. Based on the filters > and other inputs we want to either run a SQL Query on DStream or do a > custom logic processing. This would need the system to read the > filters/query and generate the execution graph at runtime. I cant see any > support in spark streaming for generating the execution graph on the fly. > I think I can broadcast the query to executors and read the broadcasted > query at runtime but that would also limit my user to 1 at a time. > > Do we not expect the spark streaming to take queries/filters from outside > world. Does output in spark streaming only means outputting to an external > source which could then be queried. > > Thanks, > Archit Thakur. >
Re: failed to run spark sample on windows
not sure, so downloaded again release 1.4.1 with Hadoop 2.6 and later options from http://spark.apache.org/downloads.html assuming the version is consistent and run the following on Windows 10 c:\spark-1.4.1-bin-hadoop2.6>bin\run-example HdfsTest still got similar exception below: (I heard there's permission config for hdfs, if so how do I do that?) 15/09/29 13:03:26 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.NullPointerException at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010) at org.apache.hadoop.util.Shell.runCommand(Shell.java:482) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:873) at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:853) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:465) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:398) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:390) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.executor.Executor.org $apache$spark$executor$Executor$$updateDependencies(Executor.scala:390) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) On Mon, Sep 28, 2015 at 4:39 PM, Ted Yuwrote: > What version of hadoop are you using ? > > Is that version consistent with the one which was used to build Spark > 1.4.0 ? > > Cheers > > On Mon, Sep 28, 2015 at 4:36 PM, Renyi Xiong > wrote: > >> I tried to run HdfsTest sample on windows spark-1.4.0 >> >> bin\run-sample org.apache.spark.examples.HdfsTest >> >> but got below exception, any body any idea what was wrong here? >> >> 15/09/28 16:33:56.565 ERROR SparkContext: Error initializing SparkContext. >> java.lang.NullPointerException >> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010) >> at org.apache.hadoop.util.Shell.runCommand(Shell.java:445) >> at org.apache.hadoop.util.Shell.run(Shell.java:418) >> at >> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) >> at org.apache.hadoop.util.Shell.execCommand(Shell.java:739) >> at org.apache.hadoop.util.Shell.execCommand(Shell.java:722) >> at >> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633) >> at >> org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467) >> at >> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:130) >> at org.apache.spark.SparkContext.(SparkContext.scala:515) >> at org.apache.spark.examples.HdfsTest$.main(HdfsTest.scala:32) >> at org.apache.spark.examples.HdfsTest.main(HdfsTest.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) >> at >> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) >> at >> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) >> at >> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> > >
Re: failed to run spark sample on windows
See http://stackoverflow.com/questions/26516865/is-it-possible-to-run-hadoop-jobs-like-the-wordcount-sample-in-the-local-mode, https://issues.apache.org/jira/browse/SPARK-6961 and finally https://issues.apache.org/jira/browse/HADOOP-10775. The easy solution is to download a Windows Hadoop distribution and point %HADOOP_HOME% to that location so winutils.exe can be picked up. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/failed-to-run-spark-sample-on-windows-tp14393p14407.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org