Re: Reading lzo+index with spark-csv (Splittable reads)

2016-01-29 Thread syepes
Well looking at the src it look like its not implemented: https://github.com/databricks/spark-csv/blob/master/src/main/scala/com/databricks/spark/csv/util/TextFile.scala#L34-L36 -- View this message in context:

Reading lzo+index with spark-csv (Splittable reads)

2016-01-29 Thread syepes
Hello, ​ I have managed to speed up the read stage when loading CSV files using the classic "newAPIHadoopFile" method, the issue is that I would like to use the spark-csv package and it seams that its not taking into consideration the LZO Index file / Splittable reads. /# Using the classic method

​Spark 1.6 - YARN Cluster Mode

2015-12-17 Thread syepes
Hello, This week I have been testing 1.6 (#d509194b) in our HDP 2.3 platform and its been working pretty ok, at the exception of the YARN cluster deployment mode. Note that with 1.5 using the same "spark-props.conf" and "spark-env.sh" config files the cluster mode works as expected. Has anyone

Re: [Yarn-Client]Can not access SparkUI

2015-10-26 Thread syepes
Hello Earthson, Is you cluster multihom​ed​? If yes, try setting the variables SPARK_LOCAL_{IP,HOSTNAME} I had this issue before: https://issues.apache.org/jira/browse/SPARK-11147 -- View this message in context:

Re: Kafka createDirectStream ​issue

2015-06-24 Thread syepes
Hello, Thanks for all the help on resolving this issue, especially to Cody who guided me to the solution. For other facing similar issues, basically the issue was that I was running Spark Streaming jobs from the spark-shell and this is not supported. Running the same job through spark-submit

Kafka createDirectStream ​issue

2015-06-23 Thread syepes
Hello, I ​am trying ​use the new Kafka ​consumer ​​KafkaUtils.createDirectStream​ but I am having some issues making it work. I have tried different versions of Spark v1.4.0 and branch-1.4 #8d6e363 and I am still getting the same strange exception ClassNotFoundException: $line49.$read$$iwC$$i

Re: Kafka createDirectStream ​issue

2015-06-23 Thread syepes
yes, I have two clusters one standalone an another using Mesos Sebastian YEPES http://sebastian-yepes.com On Wed, Jun 24, 2015 at 12:37 AM, drarse [via Apache Spark User List] ml-node+s1001560n23457...@n3.nabble.com wrote: Hi syepes, Are u run the application in standalone mode? Regards

Re: spark job progress-style report on console ?

2015-04-15 Thread syepes
Just add the following line spark.ui.showConsoleProgress true do your conf/spark-defaults.conf file. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-job-progress-style-report-on-console-tp22440p22506.html Sent from the Apache Spark User List mailing

EventLog / Timeline calculation - Optimization

2015-02-24 Thread syepes
Hello, For the past days I have been trying to process and analyse with Spark a Cassandra eventLog table similar to the one shown here. Basically what I want to calculate is the delta time epoch between each event type for all the device id's in the table. Currently its working as expected but I