Re: Any libraries to do Complex Event Processing with spark streaming?

2017-10-03 Thread shyla deshpande
On Tue, Oct 3, 2017 at 10:50 AM, shyla deshpande wrote: > Hi all, > I have a data pipeline using Spark streaming, Kafka and Cassandra. > Are there any libraries to help me with complex event processing using > Spark Streaming? > > I appreciate your help. > > Thanks >

RE: Spark 2.2.0 Win 7 64 bits Exception while deleting Spark temp dir

2017-10-03 Thread JG Perrin
do you have a little more to share with us? maybe you can set another TEMP directory. are you getting a result? From: usa usa [mailto:usact2...@gmail.com] Sent: Tuesday, October 03, 2017 10:50 AM To: user@spark.apache.org Subject: Spark 2.2.0 Win 7 64 bits Exception while deleting Spark temp dir

RE: Quick one... AWS SDK version?

2017-10-03 Thread JG Perrin
Sorry Steve - I may not have been very clear: thinking about aws-java-sdk-z.yy.xxx.jar. To the best of my knowledge, none is bundled with Spark. From: Steve Loughran [mailto:ste...@hortonworks.com] Sent: Tuesday, October 03, 2017 2:20 PM To: JG Perrin Cc:

RE: Quick one... AWS SDK version?

2017-10-03 Thread JG Perrin
Thanks Yash… this is helpful! From: Yash Sharma [mailto:yash...@gmail.com] Sent: Tuesday, October 03, 2017 1:02 AM To: JG Perrin ; user@spark.apache.org Subject: Re: Quick one... AWS SDK version? Hi JG, Here are my cluster configs if it helps. Cheers. EMR: emr-5.8.0

RE: how do you deal with datetime in Spark?

2017-10-03 Thread Adaryl Wakefield
HA! Yeah in an earlier attempt, I tried to convert everything to unix_timestamp. That went over like a lead ballon… Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC 913.938.6685 www.massstreet.net

RE: how do you deal with datetime in Spark?

2017-10-03 Thread Adaryl Wakefield
In my first attempt, I actually tried using case classes and then putting them into a data set. Scala, I guess doesn’t have a date time data type and I still wound up having to do some sort of conversion. When I tried to put the data into the dataset because I still had to define the column as

Re: Quick one... AWS SDK version?

2017-10-03 Thread Steve Loughran
On 3 Oct 2017, at 02:28, JG Perrin > wrote: Hey Sparkians, What version of AWS Java SDK do you use with Spark 2.2? Do you stick with the Hadoop 2.7.3 libs? You generally to have to stick with the version which hadoop was built with I'm

Re: how do you deal with datetime in Spark?

2017-10-03 Thread Steve Loughran
On 3 Oct 2017, at 18:43, Adaryl Wakefield > wrote: I gave myself a project to start actually writing Spark programs. I’m using Scala and Spark 2.2.0. In my project, I had to do some grouping and filtering by dates. It was awful

Hive From Spark: Jdbc VS sparkContext

2017-10-03 Thread Nicolas Paris
Hi I wonder the differences accessing HIVE tables in two different ways: - with jdbc access - with sparkContext I would say that jdbc is better since it uses HIVE that is based on map-reduce / TEZ and then works on disk. Using spark rdd can lead to memory errors on very huge datasets. Anybody

Re: how do you deal with datetime in Spark?

2017-10-03 Thread Nicholas Hakobian
I'd suggest first converting your string containing your date/time to a TimestampType or a DateType. Then the built in functions for year, month, day, etc. will then work as expected. If your date is in a "standard" format, you can perform the conversion just by casting the column to a date or

Re: how do you deal with datetime in Spark?

2017-10-03 Thread Vadim Semenov
I usually check the list of Hive UDFs as Spark has implemented almost all of them https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions Or/and check `org.apache.spark.sql.functions` directly:

Any libraries to do Complex Event Processing with spark streaming?

2017-10-03 Thread shyla deshpande
Hi all, I have a data pipeline using Spark streaming, Kafka and Cassandra. Are there any libraries to help me with complex event processing using Spark Streaming? I appreciate your help. Thanks

how do you deal with datetime in Spark?

2017-10-03 Thread Adaryl Wakefield
I gave myself a project to start actually writing Spark programs. I'm using Scala and Spark 2.2.0. In my project, I had to do some grouping and filtering by dates. It was awful and took forever. I was trying to use dataframes and SQL as much as possible. I see that there are date functions in

Re: Multiple filters vs multiple conditions

2017-10-03 Thread Vadim Semenov
Since you're using Dataset API or RDD API, they won't be fused together by the Catalyst optimizer unless you use the DF API. Two filters will get executed within one stage, and there'll be very small overhead on having two separate filters vs having only one. On Tue, Oct 3, 2017 at 8:14 AM, Ahmed

Re: PySpark - Expand rows into dataframes via function

2017-10-03 Thread Sathish Kumaran Vairavelu
Flatmap works too.. Explode function is a SQL/Dataframe way of one to many operation. Both should work. Thanks On Tue, Oct 3, 2017 at 8:30 AM Patrick McCarthy wrote: > Thanks Sathish. > > Before you responded, I came up with this solution: > > # A function to take in one

Spark 2.2.0 Win 7 64 bits Exception while deleting Spark temp dir

2017-10-03 Thread usa usa
Hi, I have installed Spark 2.2.0 in win 7 64 bits. When I did a test: c:>run-example SparkPI 10 I got error: Exception while deleting Spark temp dir C:\Users\jding01\AppData\Local\Temp\spark-xxx The solution at

Re: PySpark - Expand rows into dataframes via function

2017-10-03 Thread Patrick McCarthy
Thanks Sathish. Before you responded, I came up with this solution: # A function to take in one row and return the expanded ranges: def processRow(x): ... return zip(list_of_ip_ranges, list_of_registry_ids) # and then in spark, processed_rdds = spark_df_of_input_data.rdd.flatMap(lambda x:

Re: Multiple filters vs multiple conditions

2017-10-03 Thread Michael Artz
Hi Ahmed, Depending on which version you have it could matter. We received an email about multiple conditions in the filter not being picked up. I copied the email below that was sent out the the spark user list. The use never tried multiple one condition filters which might have worked. Hi

Re: Multiple filters vs multiple conditions

2017-10-03 Thread ayan guha
Remember transformations are lazy.so nothing happens until you call an action.at that point both are same. On Tue, Oct 3, 2017 at 11:19 PM, Femi Anthony wrote: > I would assume that the optimizer would end up transforming both to the > same expression. > > Femi > >

Re: Multiple filters vs multiple conditions

2017-10-03 Thread Femi Anthony
I would assume that the optimizer would end up transforming both to the same expression. Femi Sent from my iPhone > On Oct 3, 2017, at 8:14 AM, Ahmed Mahmoud wrote: > > Hi All, > > Just a quick question from an optimisation point of view: > > Approach 1: > .filter (t->

Multiple filters vs multiple conditions

2017-10-03 Thread Ahmed Mahmoud
Hi All, Just a quick question from an optimisation point of view: Approach 1: .filter (t-> t.x=1 && t.y=2) Approach 2: .filter (t-> t.x=1) .filter (t-> t.y=2) Is there a difference or one is better than the other or both are same? Thanks! Ahmed Mahmoud

Re: Quick one... AWS SDK version?

2017-10-03 Thread Yash Sharma
Hi JG, Here are my cluster configs if it helps. Cheers. EMR: emr-5.8.0 Hadoop distribution: Amazon 2.7.3 AWS sdk: /usr/share/aws/aws-java-sdk/aws-java-sdk-1.11.160.jar Applications: Hive 2.3.0 Spark 2.2.0 Tez 0.8.4 On Tue, 3 Oct 2017 at 12:29 JG Perrin wrote: > Hey