How to do dispatching in Streaming?

2015-04-12 Thread Jianshi Huang
Hi, I have a Kafka topic that contains dozens of different types of messages. And for each one I'll need to create a DStream for it. Currently I have to filter the Kafka stream over and over, which is very inefficient. So what's the best way to do dispatching in Spark Streaming? (one DStream -

counters in spark

2015-04-12 Thread Grandl Robert
Hi guys, I was trying to figure out some counters in Spark, related to the amount of CPU or Memory used (in some metric), used by a task/stage/job, but I could not find any.  Is there any such counter available ? Thank you,Robert

Re: How to use Joda Time with Spark SQL?

2015-04-12 Thread Justin Yip
Cheng, this is great info. I have a follow up question. There are a few very common data types (i.e. Joda DateTime) that is not directly supported by SparkSQL. Do you know if there are any plans for accommodating some common data types in SparkSQL? They don't need to be a first class datatype, but

Re: Spark TeraSort source request

2015-04-12 Thread Ewan Higgs
Hi all. The code is linked from my repo: https://github.com/ehiggs/spark-terasort This is an example Spark program for running TeraSort benchmarks. It is based on work from Reynold Xin's branch https://github.com/rxin/spark/tree/terasort, but it is not the same TeraSort program that

RE: How to use Joda Time with Spark SQL?

2015-04-12 Thread Wang, Daoyuan
Actually, I did a little investigation on joda time when I was working on SPARK-4987 for Timestamp ser-de in parquet format. I think Joda offers interface to get java object from joda time object natively. For example, to transform a java.util.Date (parent of java.sql.Date and

Re: How to use Joda Time with Spark SQL?

2015-04-12 Thread Cheng Lian
These common UDTs can always be wrapped in libraries and published to spark-packages http://spark-packages.org/ :-) Cheng On 4/12/15 3:00 PM, Justin Yip wrote: Cheng, this is great info. I have a follow up question. There are a few very common data types (i.e. Joda DateTime) that is not

Re: function to convert to pair

2015-04-12 Thread Jeetendra Gangele
I have to create some kind of index from my JavaRDDObject it should be something like javaPairRDDuniqueindex, Object but zipWith Index giving Object, Long later I need to use this RDD for join so its looks it wont work for me. On 9 April 2015 at 04:17, Ted Yu yuzhih...@gmail.com wrote: Please

Re: regarding ZipWithIndex

2015-04-12 Thread Ted Yu
bq. will return something like JavaPairRDDObject, long The long component of the pair fits your description of index. What other requirement does ZipWithIndex not provide you ? Cheers On Sun, Apr 12, 2015 at 1:16 PM, Jeetendra Gangele gangele...@gmail.com wrote: Hi All I have an RDD

regarding ZipWithIndex

2015-04-12 Thread Jeetendra Gangele
Hi All I have an RDD JavaRDDObject and I want to convert it to JavaPairRDDIndex,Object.. Index should be unique and it should maintain the order. For first object It should have 1 and then for second 2 like that. I tried using ZipWithIndex but it will return something like JavaPairRDDObject, long