Re: [Spark Structured Streaming]: Is it possible to ingest data from a jdbc data source incrementally?

2017-01-03 Thread Tamas Szuromi
You can also try https://github.com/zendesk/maxwell

Tamas

On 3 January 2017 at 12:25, Amrit Jangid  wrote:

> You can try out *debezium* : https://github.com/debezium. it reads data
> from bin-logs, provides structure and stream into Kafka.
>
> Now Kafka can be your new source for streaming.
>
> On Tue, Jan 3, 2017 at 4:36 PM, Yuanzhe Yang  wrote:
>
>> Hi Hongdi,
>>
>> Thanks a lot for your suggestion. The data is truely immutable and the
>> table is append-only. But actually there are different databases involved,
>> so the only feature they share in common and I can depend on is jdbc...
>>
>> Best regards,
>> Yang
>>
>>
>> 2016-12-30 6:45 GMT+01:00 任弘迪 :
>>
>>> why not sync binlog of mysql(hopefully the data is immutable and the
>>> table is append-only), send the log through kafka and then consume it by
>>> spark streaming?
>>>
>>> On Fri, Dec 30, 2016 at 9:01 AM, Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>>
 We don't support this yet, but I've opened this JIRA as it sounds
 generally useful: https://issues.apache.org/jira/browse/SPARK-19031

 In the mean time you could try implementing your own Source, but that
 is pretty low level and is not yet a stable API.

 On Thu, Dec 29, 2016 at 4:05 AM, "Yuanzhe Yang (杨远哲)" <
 yyz1...@gmail.com> wrote:

> Hi all,
>
> Thanks a lot for your contributions to bring us new technologies.
>
> I don't want to waste your time, so before I write to you, I googled,
> checked stackoverflow and mailing list archive with keywords "streaming"
> and "jdbc". But I was not able to get any solution to my use case. I hope 
> I
> can get some clarification from you.
>
> The use case is quite straightforward, I need to harvest a relational
> database via jdbc, do something with data, and store result into Kafka. I
> am stuck at the first step, and the difficulty is as follows:
>
> 1. The database is too large to ingest with one thread.
> 2. The database is dynamic and time series data comes in constantly.
>
> Then an ideal workflow is that multiple workers process partitions of
> data incrementally according to a time window. For example, the processing
> starts from the earliest data with each batch containing data for one 
> hour.
> If data ingestion speed is faster than data production speed, then
> eventually the entire database will be harvested and those workers will
> start to "tail" the database for new data streams and the processing
> becomes real time.
>
> With Spark SQL I can ingest data from a JDBC source with partitions
> divided by time windows, but how can I dynamically increment the time
> windows during execution? Assume that there are two workers ingesting data
> of 2017-01-01 and 2017-01-02, the one which finishes quicker gets next 
> task
> for 2017-01-03. But I am not able to find out how to increment those 
> values
> during execution.
>
> Then I looked into Structured Streaming. It looks much more promising
> because window operations based on event time are considered during
> streaming, which could be the solution to my use case. However, from
> documentation and code example I did not find anything related to 
> streaming
> data from a growing database. Is there anything I can read to achieve my
> goal?
>
> Any suggestion is highly appreciated. Thank you very much and have a
> nice day.
>
> Best regards,
> Yang
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

>>>
>>
>
>
> --
>
> Regards,
> Amrit
> Data Team
>


Mesos coarse-grained problem with spark.shuffle.service.enabled

2016-09-07 Thread Tamas Szuromi
Hello,

For a while, we're using Spark on Mesos with fine-grained mode in
production.
Since Spark 2.0 the fine-grained mode is deprecated so we'd shift to
dynamic allocation.

When I tried to setup the dynamic allocation I run into the following
problem:
So I set spark.shuffle.service.enabled = true
and spark.dynamicAllocation.enabled = true as the documentation said. We're
using Spark on Mesos with spark.executor.uri where we download the
pipeline's corresponding Spark version from HDFS. The documentation also
says In Mesos coarse-grained mode, run
$SPARK_HOME/sbin/start-mesos-shuffle-service.sh on all slave nodes. But how
is it possible to launch it before start the application, if the given
Spark will be downloaded to the Mesos executor after executor launch but
it's looking for the started external shuffle service in advance?

Is it possible I can't use spark.executor.uri and
spark.dynamicAllocation.enabled together?

Thanks in advance!

Tamas


Re: Zeppelin Spark with Dynamic Allocation

2016-07-11 Thread Tamas Szuromi
Hello,

What spark version do you use? I have the same issue with Spark 1.6.1 and
there is a ticket somewhere.

cheers,




Tamas Szuromi

Data Analyst

*Skype: *tromika
*E-mail: *tamas.szur...@odigeo.com <n...@odigeo.com>

[image: ODIGEO Hungary]

ODIGEO Hungary Kft.
1066 Budapest
Weiner Leó u. 16.

www.liligo.com  <http://www.liligo.com/>
check out our newest video  <http://www.youtube.com/user/liligo>



On 11 July 2016 at 10:09, Chanh Le <giaosu...@gmail.com> wrote:

> Hi everybody,
> I am testing zeppelin with dynamic allocation but seem it’s not working.
>
>
>
>
>
>
> Logs I received I saw that Spark Context was created successfully and task
> was running but after that was terminated.
> Any ideas on that?
> Thanks.
>
>
>
>  INFO [2016-07-11 15:03:40,096] ({Thread-0}
> RemoteInterpreterServer.java[run]:81) - Starting remote interpreter server
> on port 24994
>  INFO [2016-07-11 15:03:40,471] ({pool-1-thread-2}
> RemoteInterpreterServer.java[createInterpreter]:169) - Instantiate
> interpreter org.apache.zeppelin.spark.SparkInterpreter
>  INFO [2016-07-11 15:03:40,521] ({pool-1-thread-2}
> RemoteInterpreterServer.java[createInterpreter]:169) - Instantiate
> interpreter org.apache.zeppelin.spark.PySparkInterpreter
>  INFO [2016-07-11 15:03:40,526] ({pool-1-thread-2}
> RemoteInterpreterServer.java[createInterpreter]:169) - Instantiate
> interpreter org.apache.zeppelin.spark.SparkRInterpreter
>  INFO [2016-07-11 15:03:40,528] ({pool-1-thread-2}
> RemoteInterpreterServer.java[createInterpreter]:169) - Instantiate
> interpreter org.apache.zeppelin.spark.SparkSqlInterpreter
>  INFO [2016-07-11 15:03:40,531] ({pool-1-thread-2}
> RemoteInterpreterServer.java[createInterpreter]:169) - Instantiate
> interpreter org.apache.zeppelin.spark.DepInterpreter
>  INFO [2016-07-11 15:03:40,563] ({pool-2-thread-5}
> SchedulerFactory.java[jobStarted]:131) - Job
> remoteInterpretJob_1468224220562 started by scheduler
> org.apache.zeppelin.spark.SparkInterpreter998491254
>  WARN [2016-07-11 15:03:41,559] ({pool-2-thread-5}
> NativeCodeLoader.java[]:62) - Unable to load native-hadoop library
> for your platform... using builtin-java classes where applicable
>  INFO [2016-07-11 15:03:41,703] ({pool-2-thread-5}
> Logging.scala[logInfo]:58) - Changing view acls to: root
>  INFO [2016-07-11 15:03:41,704] ({pool-2-thread-5}
> Logging.scala[logInfo]:58) - Changing modify acls to: root
>  INFO [2016-07-11 15:03:41,708] ({pool-2-thread-5}
> Logging.scala[logInfo]:58) - SecurityManager: authentication disabled; ui
> acls disabled; users with view permissions: Set(root); users with modify
> permissions: Set(root)
>  INFO [2016-07-11 15:03:41,977] ({pool-2-thread-5}
> Logging.scala[logInfo]:58) - Starting HTTP Server
>  INFO [2016-07-11 15:03:42,029] ({pool-2-thread-5}
> Server.java[doStart]:272) - jetty-8.y.z-SNAPSHOT
>  INFO [2016-07-11 15:03:42,047] ({pool-2-thread-5}
> AbstractConnector.java[doStart]:338) - Started SocketConnector@0.0.0.0
> :53313
>  INFO [2016-07-11 15:03:42,048] ({pool-2-thread-5}
> Logging.scala[logInfo]:58) - Successfully started service 'HTTP class
> server' on port 53313.
> * INFO [2016-07-11 15:03:43,978] ({pool-2-thread-5}
> SparkInterpreter.java[createSparkContext]:233) - -- Create new
> SparkContext mesos://zk://master1:2181,master2:2181,master3:2181/mesos
> ---*
>  INFO [2016-07-11 15:03:44,003] ({pool-2-thread-5}
> Logging.scala[logInfo]:58) - Running Spark version 1.6.1
>  INFO [2016-07-11 15:03:44,036] ({pool-2-thread-5}
> Logging.scala[logInfo]:58) - Changing view acls to: root
>  INFO [2016-07-11 15:03:44,036] ({pool-2-thread-5}
> Logging.scala[logInfo]:58) - Changing modify acls to: root
>  INFO [2016-07-11 15:03:44,037] ({pool-2-thread-5}
> Logging.scala[logInfo]:58) - SecurityManager: authentication disabled; ui
> acls disabled; users with view permissions: Set(root); users with modify
> permissions: Set(root)
>  INFO [2016-07-11 15:03:44,231] ({pool-2-thread-5}
> Logging.scala[logInfo]:58) - Successfully started service 'sparkDriver' on
> port 33913.
>  INFO [2016-07-11 15:03:44,552]
> ({sparkDriverActorSystem-akka.actor.default-dispatcher-4}
> Slf4jLogger.scala[applyOrElse]:80) - Slf4jLogger started
>  INFO [2016-07-11 15:03:44,597]
> ({sparkDriverActorSystem-akka.actor.default-dispatcher-4}
> Slf4jLogger.scala[apply$mcV$sp]:74) - Starting remoting
>  INFO [2016-07-11 15:03:44,754]
> ({sparkDriverActorSystem-akka.actor.default-dispatcher-4}
> Slf4jLogger.scala[apply$mcV$sp]:74) - Remoting started; listening on
> addresses :[akka.tcp://sparkDriverActorSystem@10.197.0.3:55213]
>  INFO [2016-07-11 15:03:44,760] ({pool-2-thread-5}
> Logging.scala[logInfo]:58) - Successfully started service
> 'sparkDriver

Re: Converting a string of format of 'dd/MM/yyyy' in Spark sql

2016-03-24 Thread Tamas Szuromi
Actually, you should run  sql("select paymentdate,
unix_timestamp(paymentdate, "dd/MM/") from tmp").first


But keep in mind you will get a unix timestamp!


On 24 March 2016 at 17:29, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Thanks guys.
>
> Unfortunately neither is working
>
>  sql("select paymentdate, unix_timestamp(paymentdate) from tmp").first
> res28: org.apache.spark.sql.Row = [10/02/2014,null]
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 24 March 2016 at 14:23, Ajay Chander <itsche...@gmail.com> wrote:
>
>> Mich,
>>
>> Can you try the value for paymentdata to this
>> format  paymentdata='2015-01-01 23:59:59' , to_date(paymentdate) and see
>> if it helps.
>>
>>
>> On Thursday, March 24, 2016, Tamas Szuromi
>> <tamas.szur...@odigeo.com.invalid> wrote:
>>
>>> Hi Mich,
>>>
>>> Take a look
>>> https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/functions.html#unix_timestamp(org.apache.spark.sql.Column,%20java.lang.String)
>>>
>>> cheers,
>>> Tamas
>>>
>>>
>>> On 24 March 2016 at 14:29, Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> I am trying to convert a date in Spark temporary table
>>>>
>>>> Tried few approaches.
>>>>
>>>> scala> sql("select paymentdate, to_date(paymentdate) from tmp")
>>>> res21: org.apache.spark.sql.DataFrame = [paymentdate: string, _c1: date]
>>>>
>>>>
>>>> scala> sql("select paymentdate, to_date(paymentdate) from tmp").first
>>>> *res22: org.apache.spark.sql.Row = [10/02/2014,null]*
>>>>
>>>> My date is stored as String dd/MM/ as shown above. However,
>>>> to_date() returns null!
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>
>>>
>


Re: Converting a string of format of 'dd/MM/yyyy' in Spark sql

2016-03-24 Thread Tamas Szuromi
Hi Mich,

Take a look
https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/functions.html#unix_timestamp(org.apache.spark.sql.Column,%20java.lang.String)

cheers,
Tamas


On 24 March 2016 at 14:29, Mich Talebzadeh 
wrote:

>
> Hi,
>
> I am trying to convert a date in Spark temporary table
>
> Tried few approaches.
>
> scala> sql("select paymentdate, to_date(paymentdate) from tmp")
> res21: org.apache.spark.sql.DataFrame = [paymentdate: string, _c1: date]
>
>
> scala> sql("select paymentdate, to_date(paymentdate) from tmp").first
> *res22: org.apache.spark.sql.Row = [10/02/2014,null]*
>
> My date is stored as String dd/MM/ as shown above. However, to_date()
> returns null!
>
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>


Re: OOM When Running with Mesos Fine-grained Mode

2016-03-05 Thread Tamas Szuromi
Hey,
We had the same with Spark 1.5.x and disappeared after we upgraded to 1.6.

Tamas

On Saturday, 5 March 2016, SLiZn Liu  wrote:

> Hi Spark Mailing List,
>
> I’m running terabytes of text files with Spark on Mesos, the job runs fine
> until we decided to switch to Mesos fine-grained mode.
>
> At first glance, we spotted massive number of task lost errors in logs:
>
> 16/03/05 04:01:20 ERROR TaskSchedulerImpl: Ignoring update with state LOST 
> for TID 14420 because its task set is gone (this is likely the result of 
> receiving duplicate task finished status updates)
> 16/03/05 04:01:20 WARN TaskSetManager: Lost task 122.0 in stage 10.0 (TID 
> 13901, ourhost.com): java.io.FileNotFoundException: 
> /home/mesos/mesos-slave/slaves/20160222-161607-2315648778-5050-44877-S0/frameworks/20160222-183113-2332425994-5050-54405-0145/executors/20160222-161607-2315648778-5050-44877-S0/runs/62137cc2-317e-4500-982b-0007106aec40/blockmgr-16b8353c-ac6c-4019-b8e7-a16659cf6fe2/33/shuffle_2_122_0.index.8a14cde6-2877-4634-b4c2-fc9384f2ce8d
>  (No such file or directory)
> at java.io.FileOutputStream.open0(Native Method)
> at java.io.FileOutputStream.open(FileOutputStream.java:270)
> at java.io.FileOutputStream.(FileOutputStream.java:213)
> at java.io.FileOutputStream.(FileOutputStream.java:162)
> at 
> org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFileAndCommit(IndexShuffleBlockResolver.scala:141)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:161)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> I don’t know if the first line of task scheduler error is related, I asked
> in this mailing list before but had no luck to find the cause.
>
> As I dig further, I found the following OOM exception,
>
> 16/03/05 04:01:20 ERROR SparkUncaughtExceptionHandler: Uncaught exception in 
> thread Thread[Executor task launch worker-83,5,main]
> java.lang.OutOfMemoryError: Unable to acquire 262144 bytes of memory, got 
> 160165
> at 
> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:91)
> at 
> org.apache.spark.unsafe.map.BytesToBytesMap.allocate(BytesToBytesMap.java:735)
> at 
> org.apache.spark.unsafe.map.BytesToBytesMap.(BytesToBytesMap.java:197)
> at 
> org.apache.spark.unsafe.map.BytesToBytesMap.(BytesToBytesMap.java:212)
> at 
> org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.(UnsafeFixedWidthAggregationMap.java:103)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:483)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
> at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> Anyone knows if this is a bug, or some configuration is wrong?
> --
>
> BR,
> Todd Leo
> ​
>


Re: spark driver in docker

2016-03-05 Thread Tamas Szuromi
Hi, Have a look at on
http://spark.apache.org/docs/latest/configuration.html what
ports need to be exposed. With mesos we had a lot of problems with
container networking but yes the --net=host is a shortcut.

Tamas



On 4 March 2016 at 22:37, yanlin wang  wrote:

> We would like to run multiple spark driver in docker container. Any
> suggestion for the port expose and network settings for docker so driver is
> reachable by the worker nodes? —net=“hosts” is the last thing we want to do.
>
> Thx
> Yanlin
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Spark runs only on Mesos v0.21?

2016-02-12 Thread Tamas Szuromi
Hello Petr,

We're running Spark 1.5.2 and 1.6.0 on Mesos 0.25.0 without any problem. We
upgraded from 0.21.0 originally.

cheers,
Tamas




On 12 February 2016 at 09:31, Petr Novak  wrote:

> Hi all,
> based on documenation:
>
> "Spark 1.6.0 is designed for use with Mesos 0.21.0 and does not require
> any special patches of Mesos."
>
> We are considering Mesos for our use but this concerns me a lot. Mesos is
> currently on v0.27 which we need for its Volumes feature. But Spark locks
> us to 0.21 only. I understand that it is the problem that Mesos is not 1.0
> yet and make breaking changes to its API. But when leading frameworks
> doesn't catch up fast it beats the whole purpose of Mesos - run on one
> unified platform and share resources, single framework can lock down Mesos
> upgrade.
>
> We don't want to develop our own services against Spark 0.21 API.
>
> Is there a reason why Spark is so much behind? Does Spark actually cares
> about Mesos and its support or the focus moved to YARN?
>
> Many thanks,
> Petr
>


Re: how to use sc.hadoopConfiguration from pyspark

2015-11-23 Thread Tamas Szuromi
Hello Eike,

Thanks! Yes I'm using it with Hadoop 2.6 so I'll give a try to the 2.4
build.
Have you tried it with 1.6 Snapshot or do you know JIRA tickets for this
missing libraries issues?

Tamas





On 23 November 2015 at 10:21, Eike von Seggern <eike.segg...@sevenval.com>
wrote:

> Hello Tamas,
>
> 2015-11-20 17:23 GMT+01:00 Tamas Szuromi <tamas.szur...@odigeo.com>:
> >
> > Hello,
> >
> > I've just wanted to use sc._jsc.hadoopConfiguration().set('key','value')
> in pyspark 1.5.2 but I got set method not exists error.
>
>
> For me it's working with Spark 1.5.2 binary distribution built against
> Hadoop 2.4 (spark-1.5.2-bin-hadoop2.4):
>
>   1: sc._jsc.hadoopConfiguration().set
>   => 
>   2: sc._jsc.hadoopConfiguration().set("foo", "bar")
>
> Are you using the version built against Hadoop 2.6. I remember there
> were problems with missing libraries (or similar).
>
> Best
>
> Eike
>


how to use sc.hadoopConfiguration from pyspark

2015-11-20 Thread Tamas Szuromi
Hello,

I've just wanted to use sc._jsc.hadoopConfiguration().set('key','value') in
pyspark 1.5.2 but I got set method not exists error.

Are there anyone who know a workaround to set some hdfs related properties
like dfs.blocksize?

Thanks in advance!

Tamas


Re: ClassNotFound for exception class in Spark 1.5.x

2015-11-19 Thread Tamas Szuromi
Hi Zsolt,

How you load the jar and how you prepend it to the classpath?

Tamas




On 19 November 2015 at 11:02, Zsolt Tóth  wrote:

> Hi,
>
> I try to throw an exception of my own exception class (MyException extends
> SparkException) on one of the executors. This works fine on Spark 1.3.x,
> 1.4.x but throws a deserialization/ClassNotFound exception on Spark 1.5.x.
> This happens only when I throw it on an executor, on the driver it
> succeeds. I'm using Spark in yarn-cluster mode.
>
> Is this a known issue? Is there any workaround for it?
>
> StackTrace:
>
> 15/11/18 15:00:17 WARN spark.ThrowableSerializationWrapper: Task exception 
> could not be deserialized
> java.lang.ClassNotFoundException: org.example.MyException
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:270)
>   at 
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at 
> org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:163)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply$mcV$sp(TaskResultGetter.scala:108)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$2.apply(TaskResultGetter.scala:105)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
>   at 
> org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:105)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 15/11/18 15:00:17 ERROR scheduler.TaskResultGetter: Could not deserialize 
> TaskEndReason: ClassNotFound with classloader 
> org.apache.spark.util.MutableURLClassLoader@7578da02
> 15/11/18 15:00:17 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 
> (TID 30, hadoop2.local.dmlab.hu): UnknownReason
>
> Regards,
> Zsolt
>


Re: Spark, Mesos problems with remote connections

2015-11-02 Thread Tamas Szuromi
Hello Sebastian,

Did you set the MESOS_NATIVE_JAVA_LIBRARY variable before you started
pyspark?

cheers,

Tamas




On 2 November 2015 at 15:24, Sebastian Kuepers <
sebastian.kuep...@publicispixelpark.de> wrote:

> Hey,
>
>
> I have a Mesos cluster with a single Master. If I run the following
> directly on the master machine:
>
>
> pyspark --master mesos://host:5050
>
>
> everything works just fine. If I try to connect from to the master
> starting a driver from my laptop everything stops after the following
> log output from the Spark driver:
>
>
> I1102 15:10:57.848831 64856064 sched.cpp:164] Version: 0.24.0
> I1102 15:10:57.852708 19017728 sched.cpp:262] New master detected at
> master@217.66.55.19:5050
> I1102 15:10:57.852934 19017728 sched.cpp:272] No credentials provided.
> Attempting to register without authentication
>
>
> The Mesos master logs show the following:
>
>
> I1102 15:09:50.676004 21563 master.cpp:2250] Subscribing framework Talos
> with checkpointing disabled and capabilities [  ]
> I1102 15:09:50.676686 21566 hierarchical.hpp:515] Added framework
> b995b024-6ec6-4bd9-b251-e4e6b1828eda-0004
> I1102 15:09:51.671437 21562 http.cpp:336] HTTP GET for /master/state.json
> from 217.66.51.150:51588 with User-Agent='Mozilla/5.0 (X11; Linux x86_64)
> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36'
> I1102 15:09:52.357230 21567 master.cpp:2179] Received SUBSCRIBE call for
> framework 'Talos' at
> scheduler-fee376b3-ea04-4866-8412-957721da5a0b@10.200.120.148:51085
> I1102 15:09:52.357416 21567 master.cpp:2250] Subscribing framework Talos
> with checkpointing disabled and capabilities [  ]
> I1102 15:09:52.357520 21567 master.cpp:2260] Framework
> b995b024-6ec6-4bd9-b251-e4e6b1828eda-0004 (Talos) at
> scheduler-fee376b3-ea04-4866-8412-957721da5a0b@10.200.120.148:51085
> already subscribed, resending acknowledgement
> I1102 15:09:53.938227 21563 master.cpp:2179] Received SUBSCRIBE call for
> framework 'Talos' at
> scheduler-fee376b3-ea04-4866-8412-957721da5a0b@10.200.120.148:51085
> I1102 15:09:53.938426 21563 master.cpp:2250] Subscribing framework Talos
> with checkpointing disabled and capabilities [  ]
> I1102 15:09:53.938534 21563 master.cpp:2260] Framework
> b995b024-6ec6-4bd9-b251-e4e6b1828eda-0004 (Talos) at
> scheduler-fee376b3-ea04-4866-8412-957721da5a0b@10.200.120.148:51085
> already subscribed, resending acknowledgement
> I1102 15:10:00.207372 21567 master.cpp:2179] Received SUBSCRIBE call for
> framework 'Talos' at
> scheduler-fee376b3-ea04-4866-8412-957721da5a0b@10.200.120.148:51085
>
>
> The framework also shows up as active in the UI and blocks all resources
> of the cluster, while the SUBSCRIBE calls keep coming in.
>
> Mesos authentication is completely disabled.
>
>
> What could be possible causes for this problem?
>
>
> Thanks,
>
> Sebastian
>
>
>
>
> 
> Disclaimer The information in this email and any attachments may contain
> proprietary and confidential information that is intended for the
> addressee(s) only. If you are not the intended recipient, you are hereby
> notified that any disclosure, copying, distribution, retention or use of
> the contents of this information is prohibited. When addressed to our
> clients or vendors, any information contained in this e-mail or any
> attachments is subject to the terms and conditions in any governing
> contract. If you have received this e-mail in error, please immediately
> contact the sender and delete the e-mail.
>


looking for HDP users

2015-10-05 Thread Tamas Szuromi
Hello,

I'm looking for someone who using hortonworks data platform especially 2.3
and also using spark 1.5.x.

I have the following issue with hdp and I wanted to know is a general bug
with HDP or just a local issue.

https://issues.apache.org/jira/browse/SPARK-10896

Thanks in advance!


*Tamas*