Where are logs for Spark Kafka Yarn on Cloudera

2015-09-29 Thread Rachana Srivastava
Hello all,

I am trying to test JavaKafkaWordCount on Yarn, to make sure Yarn is working 
fine I am saving the output to hdfs.  The example works fine in local mode but 
not on yarn mode.  I cannot see any output logged when I changed the mode to 
yarn-client or yarn-cluster or cannot find any errors logged.  For my 
application id I was looking for logs under /var/log/hadoop-yarn/containers 
(e.g 
/var/log/hadoop-yarn/containers/application_1439517792099_0010/container_1439517792099_0010_01_03/stderr)
 but I cannot find anything useful information.   Is it the only location where 
application logs are logged.

Also tried setting log output to spark.yarn.app.container.log.dir but got 
access denied error.

Question:  Do we need to have some special setup to run spark streaming on 
Yarn?  How do we debug?  Where to find more details to test streaming on Yarn.

Thanks,

Rachana


Hive permanent functions are not available in Spark SQL

2015-09-29 Thread Pala M Muthaia
Hi,

I am trying to use internal UDFs that we have added as permanent functions
to Hive, from within Spark SQL query (using HiveContext), but i encounter
NoSuchObjectException, i.e. the function could not be found.

However, if i execute 'show functions' command in spark SQL, the permanent
functions appear in the list.

I am using Spark 1.4.1 with Hive 0.13.1. I tried to debug this by looking
at the log and code, but it seems both the show functions command as well
as udf query both go through essentially the same code path, but the former
can see the UDF but the latter can't.

Any ideas on how to debug/fix this?


Thanks,
pala


Spark Network Module behaviour

2015-09-29 Thread sbiookag
Dear All,

I am trying to understand how exactly spark network module works. Looking at
Netty package, I would like to intercept every server response for block
fetch. As I understood the place which is responsible for sending remote
blocks is "TransportRequestHandler.processFetchRequest". Im trying to
distinguish these flows from other flows in a switch using the information I
get from "channel" object in this function. These information are source and
destination IP and source and destination PORT. I get all these four numbers
from the channel object. The problem rises when I'm looking at the switch
log and I see extensive traffic going back and forth, which do not belong to
these flows I'm capturing. Now my question is, I am missing other places,
which are responsible for huge network traffic? Is the port and IP I get
from channel is not the correct one?

Thank,
Saman



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Network-Module-behaviour-tp14402.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Dynamic DAG use-case for spark streaming.

2015-09-29 Thread Archit Thakur
Hi,

 We are using spark streaming as our processing engine, and as part of
output we want to push the data to UI. Now there would be multiple users
accessing the system with there different filters on. Based on the filters
and other inputs we want to either run a SQL Query on DStream or do a
custom logic processing. This would need the system to read the
filters/query and generate the execution graph at runtime. I cant see any
support in spark streaming for generating the execution graph on the fly.
I think I can broadcast the query to executors and read the broadcasted
query at runtime but that would also limit my user to 1 at a time.

Do we not expect the spark streaming to take queries/filters from outside
world. Does output in spark streaming only means outputting to an external
source which could then be queried.

Thanks,
Archit Thakur.


Too many executors are created

2015-09-29 Thread Ulanov, Alexander
Dear Spark developers,

I have created a simple Spark application for spark submit. It calls a machine 
learning library from Spark MLlib that is executed in a number of iterations 
that correspond to the same number of task in Spark. It seems that Spark 
creates an executor for each task and then removes it. The following messages 
indicate this in my log:

15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20150929120924-/24463 is now RUNNING
15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20150929120924-/24463 is now EXITED (Command exited with code 1)
15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Executor 
app-20150929120924-/24463 removed: Command exited with code 1
15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 24463
15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor added: 
app-20150929120924-/24464 on worker-20150929120330-16.111.35.101-46374 
(16.111.35.101:46374) with 12 cores
15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20150929120924-/24464 on hostPort 16.111.35.101:46374 with 12 cores, 
30.0 GB RAM
15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20150929120924-/24464 is now LOADING
15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20150929120924-/24464 is now RUNNING
15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20150929120924-/24464 is now EXITED (Command exited with code 1)
15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Executor 
app-20150929120924-/24464 removed: Command exited with code 1
15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 24464
15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor added: 
app-20150929120924-/24465 on worker-20150929120330-16.111.35.101-46374 
(16.111.35.101:46374) with 12 cores
15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20150929120924-/24465 on hostPort 16.111.35.101:46374 with 12 cores, 
30.0 GB RAM
15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20150929120924-/24465 is now LOADING
15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20150929120924-/24465 is now EXITED (Command exited with code 1)
15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Executor 
app-20150929120924-/24465 removed: Command exited with code 1
15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 24465
15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor added: 
app-20150929120924-/24466 on worker-20150929120330-16.111.35.101-46374 
(16.111.35.101:46374) with 12 cores
15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20150929120924-/24466 on hostPort 16.111.35.101:46374 with 12 cores, 
30.0 GB RAM
15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20150929120924-/24466 is now LOADING
15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20150929120924-/24466 is now RUNNING

It end up creating and removing thousands of executors. Is this a normal 
behavior?

If I run the same code within spark-shell, this does not happen. Could you 
suggest what might be wrong in my setting?

Best regards, Alexander


Re: Dynamic DAG use-case for spark streaming.

2015-09-29 Thread Tathagata Das
A very basic support that is there in DStream is DStream.transform() which
take arbitrary RDD => RDD function. This function can actually choose to do
different computation with time. That may be of help to you.

On Tue, Sep 29, 2015 at 12:06 PM, Archit Thakur 
wrote:

> Hi,
>
>  We are using spark streaming as our processing engine, and as part of
> output we want to push the data to UI. Now there would be multiple users
> accessing the system with there different filters on. Based on the filters
> and other inputs we want to either run a SQL Query on DStream or do a
> custom logic processing. This would need the system to read the
> filters/query and generate the execution graph at runtime. I cant see any
> support in spark streaming for generating the execution graph on the fly.
> I think I can broadcast the query to executors and read the broadcasted
> query at runtime but that would also limit my user to 1 at a time.
>
> Do we not expect the spark streaming to take queries/filters from outside
> world. Does output in spark streaming only means outputting to an external
> source which could then be queried.
>
> Thanks,
> Archit Thakur.
>


Re: failed to run spark sample on windows

2015-09-29 Thread Renyi Xiong
not sure, so downloaded  again release 1.4.1 with Hadoop 2.6 and later
options from http://spark.apache.org/downloads.html assuming the version is
consistent and run the following on Windows 10

c:\spark-1.4.1-bin-hadoop2.6>bin\run-example HdfsTest 

still got similar exception below: (I heard there's permission config for
hdfs, if so how do I do that?)

15/09/29 13:03:26 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:873)
at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:853)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:465)
at
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:398)
at
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:390)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.spark.executor.Executor.org
$apache$spark$executor$Executor$$updateDependencies(Executor.scala:390)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

On Mon, Sep 28, 2015 at 4:39 PM, Ted Yu  wrote:

> What version of hadoop are you using ?
>
> Is that version consistent with the one which was used to build Spark
> 1.4.0 ?
>
> Cheers
>
> On Mon, Sep 28, 2015 at 4:36 PM, Renyi Xiong 
> wrote:
>
>> I tried to run HdfsTest sample on windows spark-1.4.0
>>
>> bin\run-sample org.apache.spark.examples.HdfsTest 
>>
>> but got below exception, any body any idea what was wrong here?
>>
>> 15/09/28 16:33:56.565 ERROR SparkContext: Error initializing SparkContext.
>> java.lang.NullPointerException
>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
>> at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
>> at org.apache.hadoop.util.Shell.run(Shell.java:418)
>> at
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
>> at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
>> at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
>> at
>> org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467)
>> at
>> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:130)
>> at org.apache.spark.SparkContext.(SparkContext.scala:515)
>> at org.apache.spark.examples.HdfsTest$.main(HdfsTest.scala:32)
>> at org.apache.spark.examples.HdfsTest.main(HdfsTest.scala)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
>> at
>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
>> at
>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
>> at
>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>
>


Re: failed to run spark sample on windows

2015-09-29 Thread saurfang
See
http://stackoverflow.com/questions/26516865/is-it-possible-to-run-hadoop-jobs-like-the-wordcount-sample-in-the-local-mode,
https://issues.apache.org/jira/browse/SPARK-6961 and finally
https://issues.apache.org/jira/browse/HADOOP-10775. The easy solution is to
download a Windows Hadoop distribution and point %HADOOP_HOME% to that
location so winutils.exe can be picked up.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/failed-to-run-spark-sample-on-windows-tp14393p14407.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org