Re: Are there any alternatives to Hive "stored by" clause as Spark 2.0 does not support it

2018-02-08 Thread Pralabh Kumar
g Spark SQL https://bit.ly/mastering-spark-sql > Spark Structured Streaming https://bit.ly/spark-structured-streaming > Mastering Kafka Streams https://bit.ly/mastering-kafka-streams > Follow me at https://twitter.com/jaceklaskowski > > On Thu, Feb 8, 2018 at 7:25 AM, Pralabh Kumar <pr

Re: Are there any alternatives to Hive "stored by" clause as Spark 2.0 does not support it

2018-02-08 Thread Jacek Laskowski
-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski On Thu, Feb 8, 2018 at 7:25 AM, Pralabh Kumar <pralabhku...@gmail.com> wrote: > Hi > > Spark 2.0 doesn't support stored by . Is there any alternative to achieve > the same. > > >

Are there any alternatives to Hive "stored by" clause as Spark 2.0 does not support it

2018-02-07 Thread Pralabh Kumar
Hi Spark 2.0 doesn't support stored by . Is there any alternative to achieve the same.

Re: spark 2.0 and spark 2.2

2018-01-22 Thread Xiao Li
. Thanks, Xiao 2018-01-22 7:07 GMT-08:00 Mihai Iacob <mia...@ca.ibm.com>: > Does spark 2.2 have good backwards compatibility? Is there something that > won't work that works in spark 2.0? > > > Regards, > > *Mihai Iacob* > DSX Local <https://datascience.ibm.com

spark 2.0 and spark 2.2

2018-01-22 Thread Mihai Iacob
Does spark 2.2 have good backwards compatibility? Is there something that won't work that works in spark 2.0?   Regards,  Mihai IacobDSX Local - Security

Re: Spark 2.0 and Oracle 12.1 error

2017-07-24 Thread Cassa L
21, 2017 at 10:12 AM, Xiao Li <gatorsm...@gmail.com> wrote: > >> Could you share the schema of your Oracle table and open a JIRA? >> >> Thanks! >> >> Xiao >> >> >> 2017-07-21 9:40 GMT-07:00 Cassa L <lcas...@gmail.com>: >> >>> I am us

Re: Spark 2.0 and Oracle 12.1 error

2017-07-21 Thread Cassa L
017 at 10:12 AM, Xiao Li <gatorsm...@gmail.com> wrote: > Could you share the schema of your Oracle table and open a JIRA? > > Thanks! > > Xiao > > > 2017-07-21 9:40 GMT-07:00 Cassa L <lcas...@gmail.com>: > >> I am using 2.2.0. I resolved the problem by rem

Re: Spark 2.0 and Oracle 12.1 error

2017-07-21 Thread Xiao Li
>> >> >> On Wed, 19 Jul 2017 at 11:10 PM Cassa L <lcas...@gmail.com> wrote: >> >>> Hi, >>> I am trying to use Spark to read from Oracle (12.1) table using Spark >>> 2.0. My table h

Re: Spark 2.0 and Oracle 12.1 error

2017-07-21 Thread Cassa L
multiple Oracle related issues in the latest > release. > > Thanks > > Xiao > > > On Wed, 19 Jul 2017 at 11:10 PM Cassa L <lcas...@gmail.com> wrote: > >> Hi, >> I am trying to use Spark to read from Oracle (12.1) table using Spark >> 2.0. My table

Re: Spark 2.0 and Oracle 12.1 error

2017-07-21 Thread Xiao Li
Could you try 2.2? We fixed multiple Oracle related issues in the latest release. Thanks Xiao On Wed, 19 Jul 2017 at 11:10 PM Cassa L <lcas...@gmail.com> wrote: > Hi, > I am trying to use Spark to read from Oracle (12.1) table using Spark 2.0. > My table has JSON data. I a

Re: Spark 2.0 and Oracle 12.1 error

2017-07-20 Thread ayan guha
cas...@gmail.com> wrote: > Hi, > I am trying to use Spark to read from Oracle (12.1) table using Spark 2.0. > My table has JSON data. I am getting below exception in my code. Any clue? > > >>>>> > java.sql.SQLException: Unsupported type -101 > > at or

Spark-2.0 and Oracle 12.1 error: Unsupported type -101

2017-07-20 Thread Cassa L
Hi, I am trying to read data into Spark from Oracle using ojdb7 driver. The data is in JSON format. I am getting below error. Any idea on how to resolve it? ava.sql.SQLException: Unsupported type -101 at

Spark 2.0 and Oracle 12.1 error

2017-07-20 Thread Cassa L
Hi, I am trying to use Spark to read from Oracle (12.1) table using Spark 2.0. My table has JSON data. I am getting below exception in my code. Any clue? >>>>> java.sql.SQLException: Unsupported type -101 at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$a

command to get list oin spark 2.0 scala of all persisted rdd's in spark 2.0 scala shell

2017-06-01 Thread nancy henry
Hi Team, Please let me know how to get list of all persisted RDD's ins park 2.0 shell Regards, Nancy

Re: Issues: Generate JSON with null values in Spark 2.0.x

2017-03-21 Thread Dongjin Lee
Hi Chetan, Sadly, you can not; Spark is configured to ignore the null values when writing JSON. (check JacksonMessageWriter and find JsonInclude.Include.NON_NULL from the code.) If you want that functionality, it would be much better to file the problem to JIRA. Best, Dongjin On Mon, Mar 20,

Re: Issues: Generate JSON with null values in Spark 2.0.x

2017-03-20 Thread Chetan Khatri
Exactly. On Sat, Mar 11, 2017 at 1:35 PM, Dongjin Lee wrote: > Hello Chetan, > > Could you post some code? If I understood correctly, you are trying to > save JSON like: > > { > "first_name": "Dongjin", > "last_name: null > } > > not in omitted form, like: > > { >

Re: Issues: Generate JSON with null values in Spark 2.0.x

2017-03-11 Thread Dongjin Lee
Hello Chetan, Could you post some code? If I understood correctly, you are trying to save JSON like: { "first_name": "Dongjin", "last_name: null } not in omitted form, like: { "first_name": "Dongjin" } right? - Dongjin On Wed, Mar 8, 2017 at 5:58 AM, Chetan Khatri

Issues: Generate JSON with null values in Spark 2.0.x

2017-03-07 Thread Chetan Khatri
Hello Dev / Users, I am working with PySpark Code migration to scala, with Python - Iterating Spark with dictionary and generating JSON with null is possible with json.dumps() which will be converted to SparkSQL[Row] but in scala how can we generate json will null values as a Dataframe ? Thanks.

Re: Spark 2.0 issue with left_outer join

2017-03-04 Thread ayan guha
How about running this - select * from (select * , count() over (partition by id order by id) c from filteredDS) f where f.cnt < 7500 On Sun, Mar 5, 2017 at 12:05 PM, Ankur Srivastava < ankur.srivast...@gmail.com> wrote: > Yes every time I run this code with production scale data it fails.

Re: Spark 2.0 issue with left_outer join

2017-03-04 Thread Ankur Srivastava
Yes every time I run this code with production scale data it fails. Test case with small dataset of 50 records on local box runs fine. Thanks Ankur Sent from my iPhone > On Mar 4, 2017, at 12:09 PM, ayan guha wrote: > > Just to be sure, can you reproduce the error using

Re: Spark 2.0 issue with left_outer join

2017-03-04 Thread ayan guha
Just to be sure, can you reproduce the error using sql api? On Sat, 4 Mar 2017 at 2:32 pm, Ankur Srivastava wrote: > Adding DEV. > > Or is there any other way to do subtractByKey using Dataset APIs? > > Thanks > Ankur > > On Wed, Mar 1, 2017 at 1:28 PM, Ankur

Re: Spark 2.0 issue with left_outer join

2017-03-03 Thread Ankur Srivastava
Adding DEV. Or is there any other way to do subtractByKey using Dataset APIs? Thanks Ankur On Wed, Mar 1, 2017 at 1:28 PM, Ankur Srivastava wrote: > Hi Users, > > We are facing an issue with left_outer join using Spark Dataset api in 2.0 > Java API. Below is the

Spark 2.0 issue with left_outer join

2017-03-01 Thread Ankur Srivastava
Hi Users, We are facing an issue with left_outer join using Spark Dataset api in 2.0 Java API. Below is the code we have Dataset badIds = filteredDS.groupBy(col("id").alias("bid")).count() .filter((FilterFunction) row -> (Long) row.getAs("count") > 75000); _logger.info("Id count with

Re: question on transforms for spark 2.0 dataset

2017-03-01 Thread Bill Schwanitz
Subhash, Yea that did the trick thanks! On Wed, Mar 1, 2017 at 12:20 PM, Subhash Sriram wrote: > If I am understanding your problem correctly, I think you can just create > a new DataFrame that is a transformation of sample_data by first > registering sample_data as a

Re: question on transforms for spark 2.0 dataset

2017-03-01 Thread Subhash Sriram
If I am understanding your problem correctly, I think you can just create a new DataFrame that is a transformation of sample_data by first registering sample_data as a temp table. //Register temp table sample_data.createOrReplaceTempView("sql_sample_data") //Create new DataSet with transformed

Re: question on transforms for spark 2.0 dataset

2017-03-01 Thread Marco Mistroni
Hi I think u need an UDF if u want to transform a column Hth On 1 Mar 2017 4:22 pm, "Bill Schwanitz" wrote: > Hi all, > > I'm fairly new to spark and scala so bear with me. > > I'm working with a dataset containing a set of column / fields. The data > is stored in hdfs as

question on transforms for spark 2.0 dataset

2017-03-01 Thread Bill Schwanitz
Hi all, I'm fairly new to spark and scala so bear with me. I'm working with a dataset containing a set of column / fields. The data is stored in hdfs as parquet and is sourced from a postgres box so fields and values are reasonably well formed. We are in the process of trying out a switch from

Re: My spark job runs faster in spark 1.6 and much slower in spark 2.0

2017-02-14 Thread arun kumar Natva
com> wrote: > >> Hi, >> I am reading an ORC file, and perform some joins, aggregations and finally >> generate a dense vector to perform analytics. >> >> The code runs in 45 minutes on spark 1.6 on a 4 node cluster. When the >> same >> code is mig

Re: My spark job runs faster in spark 1.6 and much slower in spark 2.0

2017-02-14 Thread Jörn Franke
e, and perform some joins, aggregations and finally >> generate a dense vector to perform analytics. >> >> The code runs in 45 minutes on spark 1.6 on a 4 node cluster. When the same >> code is migrated to run on spark 2.0 on the same cluster, it takes around >> 4-5 h

Re: My spark job runs faster in spark 1.6 and much slower in spark 2.0

2017-02-14 Thread Timur Shenkao
ading an ORC file, and perform some joins, aggregations and finally > generate a dense vector to perform analytics. > > The code runs in 45 minutes on spark 1.6 on a 4 node cluster. When the same > code is migrated to run on spark 2.0 on the same cluster, it takes around >

My spark job runs faster in spark 1.6 and much slower in spark 2.0

2017-02-14 Thread anatva
Hi, I am reading an ORC file, and perform some joins, aggregations and finally generate a dense vector to perform analytics. The code runs in 45 minutes on spark 1.6 on a 4 node cluster. When the same code is migrated to run on spark 2.0 on the same cluster, it takes around 4-5 hours

Re: Spark 2.0 Scala 2.11 and Kafka 0.10 Scala 2.10

2017-02-08 Thread Cody Koeninger
Pretty sure there was no 0.10.0.2 release of apache kafka. If that's a hortonworks modified version you may get better results asking in a hortonworks specific forum. Scala version of kafka shouldn't be relevant either way though. On Wed, Feb 8, 2017 at 5:30 PM, u...@moosheimer.com

Spark 2.0 Scala 2.11 and Kafka 0.10 Scala 2.10

2017-02-08 Thread u...@moosheimer.com
Dear devs, is it possible to use Spark 2.0.2 Scala 2.11 and consume messages from kafka server 0.10.0.2 running on Scala 2.10? I tried this the last two days by using createDirectStream and can't get no message out of kafka?! I'm using HDP 2.5.3 running kafka_2.10-0.10.0.2.5.3.0-37 and Spark

RE: Jars directory in Spark 2.0

2017-02-01 Thread Sidney Feiner
Feiner <sidney.fei...@startapp.com> Cc: Koert Kuipers <ko...@tresata.com>; user@spark.apache.org Subject: Re: Jars directory in Spark 2.0 Spark has never shaded dependencies (in the sense of renaming the classes), with a couple of exceptions (Guava and Jetty). So that behavior is not

Re: Jars directory in Spark 2.0

2017-02-01 Thread Marcelo Vanzin
idney.fei...@startapp.com> > *Cc:* user@spark.apache.org > *Subject:* Re: Jars directory in Spark 2.0 > > > > you basically have to keep your versions of dependencies in line with > sparks or shade your own dependencies. > > you cannot just replace the jars in sparks jars

RE: Jars directory in Spark 2.0

2017-01-31 Thread Sidney Feiner
uipers [mailto:ko...@tresata.com] Sent: Tuesday, January 31, 2017 7:26 PM To: Sidney Feiner <sidney.fei...@startapp.com> Cc: user@spark.apache.org Subject: Re: Jars directory in Spark 2.0 you basically have to keep your versions of dependencies in line with sparks or shade your own depend

Re: Jars directory in Spark 2.0

2017-01-31 Thread Koert Kuipers
you basically have to keep your versions of dependencies in line with sparks or shade your own dependencies. you cannot just replace the jars in sparks jars folder. if you wan to update them you have to build spark yourself with updated dependencies and confirm it compiles, passes tests etc. On

Jars directory in Spark 2.0

2017-01-31 Thread Sidney Feiner
Hey, While migrating to Spark 2.X from 1.6, I've had many issues with jars that come preloaded with Spark in the "jars/" directory and I had to shade most of my packages. Can I replace the jars in this folder to more up to date versions? Are those jar used for anything internal in Spark which

Re: Spark 2.0 vs MongoDb /Cannot find dependency using sbt

2017-01-16 Thread Marco Mistroni
sorry. should have done more research before jumping to the list the version of the connector is 2.0.0, available from maven repors sorry On Mon, Jan 16, 2017 at 9:32 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > HI all > in searching on how to use Spark 2.0 with mongo i

Spark 2.0 vs MongoDb /Cannot find dependency using sbt

2017-01-16 Thread Marco Mistroni
HI all in searching on how to use Spark 2.0 with mongo i came across this link https://jira.mongodb.org/browse/SPARK-20 i amended my build.sbt (content below), however the mongodb dependency was not found Could anyone assist? kr marco name := "SparkExamples" version := "1.

Re: Machine Learning in Spark 1.6 vs Spark 2.0

2017-01-09 Thread Md. Rezaul Karim
2.1.0 still have any issues w.r.t. stability? > > > > Regards, > > Ankur > > > > *From:* Md. Rezaul Karim [mailto:rezaul.ka...@insight-centre.org] > *Sent:* Monday, January 09, 2017 5:02 PM > *To:* Ankur Jain <ankur.j...@yash.com> > *Cc:* user@spark.apache.org

RE: Machine Learning in Spark 1.6 vs Spark 2.0

2017-01-09 Thread Ankur Jain
rk 1.6 vs Spark 2.0 Hello Jain, I would recommend using Spark MLlib <http://spark.apache.org/docs/latest/ml-guide.html> (and ML) of Spark 2.1.0 with the following features: * ML Algorithms: common learning algorithms such as classification, regression, clustering, and collaborati

Re: Machine Learning in Spark 1.6 vs Spark 2.0

2017-01-09 Thread Md. Rezaul Karim
Hello Jain, I would recommend using Spark MLlib (and ML) of *Spark 2.1.0* with the following features: - ML Algorithms: common learning algorithms such as classification, regression, clustering, and collaborative filtering -

Machine Learning in Spark 1.6 vs Spark 2.0

2017-01-09 Thread Ankur Jain
Hi Team, I want to start a new project with ML. But wanted to know which version of Spark is much stable and have more features w.r.t ML Please suggest your opinion... Thanks in Advance... [cid:image013.png@01D1AAE2.28F7BBF0] Thanks & Regards Ankur Jain Technical Architect - Big Data | IoT |

Re: Kafka 0.8 + Spark 2.0 Partition Issue

2017-01-06 Thread Cody Koeninger
Kafka is designed to only allow reads from leaders. You need to fix this at the kafka level not the spark level. On Fri, Jan 6, 2017 at 7:33 AM, Raghu Vadapalli <raghuvadapa...@aol.com> wrote: > > My spark 2.0 + kafka 0.8 streaming job fails with error partition leaderset > ex

Kafka 0.8 + Spark 2.0 Partition Issue

2017-01-06 Thread Raghu Vadapalli
My spark 2.0 + kafka 0.8 streaming job fails with error partition leaderset exception. When I check the kafka topic the partition, it is indeed in error with Leader = -1 and empty ISR. I did lot of google and all of them point to either restarting or deleting the topic. To do any of those

Re: Why does Spark 2.0 change number or partitions when reading a parquet file?

2016-12-22 Thread Daniel Siegmann
Spark 2.0.0 introduced "Automatic file coalescing for native data sources" ( http://spark.apache.org/releases/spark-release-2-0-0.html#performance-and-runtime). Perhaps that is the cause? I'm not sure if this feature is mentioned anywhere in the documentation or if there's any way to disable it.

Why does Spark 2.0 change number or partitions when reading a parquet file?

2016-12-22 Thread Kristina Rogale Plazonic
Hi, I write a randomly generated 30,000-row dataframe to parquet. I verify that it has 200 partitions (both in Spark and inspecting the parquet file in hdfs). When I read it back in, it has 23 partitions?! Is there some optimization going on? (This doesn't happen in Spark 1.5) *How can I force

Re: About Spark Multiple Shared Context with Spark 2.0

2016-12-13 Thread Calvin Jia
Hi, Alluxio will allow you to share or cache data in-memory between different Spark contexts by storing RDDs or Dataframes as a file in the Alluxio system. The files can then be accessed by any Spark job like a file in any other distributed storage system. These two blogs do a good job of

About Spark Multiple Shared Context with Spark 2.0

2016-12-13 Thread Chetan Khatri
Hello Guys, What would be approach to accomplish Spark Multiple Shared Context without Alluxio and with with Alluxio , and what would be best practice to achieve parallelism and concurrency for spark jobs. Thanks. -- Yours Aye, Chetan Khatri. M.+91 7 80574 Data Science Researcher INDIA

Fwd: [Spark Dataset]: How to conduct co-partition join in the new Dataset API in Spark 2.0

2016-12-01 Thread w.zhaokang
Hi all, In the old Spark RDD API, key-value PairRDDs can be co-partitioned to avoid shuffle thus bringing us high join performance. In the new Dataset API in Spark 2.0, is the high performance shuffle-free join by co-partition mechanism still feasible? I have looked through the API doc

[Spark Dataset]: How to conduct co-partition join in the new Dataset API in Spark 2.0

2016-12-01 Thread Dale Wang
Hi all, In the old Spark RDD API, key-value PairRDDs can be co-partitioned to avoid shuffle thus bringing us high join performance. In the new Dataset API in Spark 2.0, is the high performance shuffle-free join by co-partition mechanism still feasible? I have looked through the API doc

Re: Can't read tables written in Spark 2.1 in Spark 2.0 (and earlier)

2016-11-30 Thread Reynold Xin
6080. >>> >>> Thanks, >>> >>> Yin >>> >>> On Tue, Nov 29, 2016 at 11:34 PM, Timur Shenkao <t...@timshenkao.su> >>> wrote: >>> >>>> Hi! >>>> >>>> Do you have real HIVE installation? >>>&

Re: Can't read tables written in Spark 2.1 in Spark 2.0 (and earlier)

2016-11-30 Thread Timur Shenkao
ache/spark/pull/16080. >> >> Thanks, >> >> Yin >> >> On Tue, Nov 29, 2016 at 11:34 PM, Timur Shenkao <t...@timshenkao.su> >> wrote: >> >>> Hi! >>> >>> Do you have real HIVE installation? >>> Have you built S

Re: Can't read tables written in Spark 2.1 in Spark 2.0 (and earlier)

2016-11-30 Thread Gourav Sengupta
u for reporting this issue. It will be fixed by > https://github.com/apache/spark/pull/16080. > > Thanks, > > Yin > > On Tue, Nov 29, 2016 at 11:34 PM, Timur Shenkao <t...@timshenkao.su> wrote: > >> Hi! >> >> Do you have real HIVE installation? >> Have

SPARK 2.0 CSV exports (https://issues.apache.org/jira/browse/SPARK-16893)

2016-11-30 Thread Gourav Sengupta
Hi Sean, I think that the main issue was users importing the package while starting SPARK just like the way we used to do in SPARK 1.6. After removing that option from --package while starting SPARK 2.0 the issue of conflicting libraries disappeared. I have written about this in https

Re: Can't read tables written in Spark 2.1 in Spark 2.0 (and earlier)

2016-11-30 Thread Yin Huai
Hello Michael, Thank you for reporting this issue. It will be fixed by https://github.com/apache/spark/pull/16080. Thanks, Yin On Tue, Nov 29, 2016 at 11:34 PM, Timur Shenkao <t...@timshenkao.su> wrote: > Hi! > > Do you have real HIVE installation? > Have you built Spa

Re: Can't read tables written in Spark 2.1 in Spark 2.0 (and earlier)

2016-11-29 Thread Timur Shenkao
Hi! Do you have real HIVE installation? Have you built Spark 2.1 & Spark 2.0 with HIVE support ( -Phive -Phive-thriftserver ) ? It seems that you use "default" Spark's HIVE 1.2.1. Your metadata is stored in local Derby DB which is visible to concrete Spark installation but not fo

Pasting oddity with Spark 2.0 (scala)

2016-11-14 Thread jggg777
This one has stumped the group here, hoping to get some insight into why this error is happening. I'm going through the Databricks DataFrames scala docs

Hive Queries are running very slowly in Spark 2.0

2016-11-09 Thread Jaya Shankar Vadisela
Hi ALL I have below simple HIVE Query, we have a use-case where we will run multiple HIVE queries in parallel, in our case it is 16 (num of cores in our machine, using scala PAR array). In Spark 1.6 it is executing in 10 secs but in Spark 2.0 same queries are taking 5 mins. "select * fro

Hive Queries are running very slowly in Spark 2.0

2016-11-09 Thread Jaya Shankar Vadisela
Hi ALL I have below simple HIVE Query, we have a use-case where we will run multiple HIVE queries in parallel, in our case it is 16 (num of cores in our machine, using scala PAR array). In Spark 1.6 it is executing in 10 secs but in Spark 2.0 same queries are taking 5 mins. "select * fro

Re: Do you use spark 2.0 in work?

2016-10-31 Thread Andy Dang
This is my personal email so I can't exactly discuss work-related topics. But yes, many teams in my company use Spark 2.0 in production environment. What are the challenges that prevent you from adopting it (besides migration from Spark 1.x)? --- Regards, Andy On Mon, Oct 31, 2016 at 8:16

Do you use spark 2.0 in work?

2016-10-31 Thread Yang Cao
Hi guys, Just for personal interest. I wonder whether spark 2.0 a productive version? Is there any company use this version as its main version in daily work? THX - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark 2.0 with Hadoop 3.0?

2016-10-30 Thread adam kramer
es? > > Is there any reason why Hadoop 3.0 is a non-starter for use with Spark > 2.0? The version of aws-sdk in 3.0 actually works for DynamoDB which > would resolve our driver dependency issues. > > > what version problems are you having there? > > > There's a pat

Re: Spark 2.0 with Hadoop 3.0?

2016-10-29 Thread Steve Loughran
On 27 Oct 2016, at 23:04, adam kramer <ada...@gmail.com<mailto:ada...@gmail.com>> wrote: Is the version of Spark built for Hadoop 2.7 and later only for 2.x releases? Is there any reason why Hadoop 3.0 is a non-starter for use with Spark 2.0? The version of aws-sdk in 3.0 ac

Re: Spark 2.0 with Hadoop 3.0?

2016-10-28 Thread Zoltán Zvara
be somewhat different API-wise. > > On Thu, Oct 27, 2016 at 11:04 PM adam kramer <ada...@gmail.com> wrote: > > Is the version of Spark built for Hadoop 2.7 and later only for 2.x > releases? > > Is there any reason why Hadoop 3.0 is a non-starter for use with Spark > 2.0? Th

Re: Spark 2.0 with Hadoop 3.0?

2016-10-28 Thread Sean Owen
gt; releases? > > Is there any reason why Hadoop 3.0 is a non-starter for use with Spark > 2.0? The version of aws-sdk in 3.0 actually works for DynamoDB which > would resolve our driver dependency issues. > > Thanks, > Adam > > -

Spark 2.0 with Hadoop 3.0?

2016-10-27 Thread adam kramer
Is the version of Spark built for Hadoop 2.7 and later only for 2.x releases? Is there any reason why Hadoop 3.0 is a non-starter for use with Spark 2.0? The version of aws-sdk in 3.0 actually works for DynamoDB which would resolve our driver dependency issues. Thanks, Adam

Spark 2.0 on HDP

2016-10-27 Thread Deenar Toraskar
Hi Has anyone tried running Spark 2.0 on HDP. I have managed to get around the issues with the timeline service (by turning it off), but now am stuck when the YARN cannot find org.apache.spark.deploy.yarn.ExecutorLauncher. Error: Could not find or load main class

RE: Spark 2.0 - DataFrames vs Dataset performance

2016-10-24 Thread Mendelson, Assaf
PM To: Antoaneta Marinova Cc: user Subject: Re: Spark 2.0 - DataFrames vs Dataset performance Hi Antoaneta, I believe the difference is not due to Datasets being slower (DataFrames are just an alias to Datasets now), but rather using a user defined function for filtering vs using Spark builtins

Re: Spark 2.0 - DataFrames vs Dataset performance

2016-10-24 Thread Daniel Darabos
I'm wrong. On Mon, Oct 24, 2016 at 2:50 PM, Antoaneta Marinova < antoaneta.vmarin...@gmail.com> wrote: > Hello, > > I am using Spark 2.0 for performing filtering, grouping and counting > operations on events data saved in parquet files. As the events schema has > very nested

Accessing Phoenix table from Spark 2.0., any cure!

2016-10-24 Thread Mich Talebzadeh
My stack is this Spark: Spark 2.0.0 Zookeeper: ZooKeeper 3.4.6 Hbase: hbase-1.2.3 Phoenix: apache-phoenix-4.8.1-HBase-1.2-bin I am running this simple code scala> val df = sqlContext.load("org.apache.phoenix.spark", | Map("table" -> "MARKETDATAHBASE", "zkUrl" -> "rhes564:2181") | )

Re: reading info from spark 2.0 application UI

2016-10-24 Thread Sean Owen
What matters in this case is how many vcores YARN thinks it can allocate per machine. I think the relevant setting is yarn.nodemanager.resource.cpu-vcores. I bet you'll find this is actually more than the machine's number of cores, possibly on purpose, to enable some over-committing. On Mon, Oct

Re: reading info from spark 2.0 application UI

2016-10-24 Thread Sean Owen
If you're really sure that 4 executors are on 1 machine, then it means your resource manager allowed it. What are you using, YARN? check that you really are limited to 40 cores per machine in the YARN config. On Mon, Oct 24, 2016 at 3:33 PM TheGeorge1918 . wrote: > Hi

Spark 2.0 - DataFrames vs Dataset performance

2016-10-24 Thread Antoaneta Marinova
Hello, I am using Spark 2.0 for performing filtering, grouping and counting operations on events data saved in parquet files. As the events schema has very nested structure I wanted to read them as scala beans to simplify the code but I noticed a severe performance degradation. Below you can find

Using a Custom Data Store with Spark 2.0

2016-10-24 Thread Sachith Withana
Hi all, I have a requirement to integrate a custom data store to be used with Spark ( v2.0.1). It consists of structured data in tables along with the schemas. Then I want to run SparkSQL queries on the data and provide the data back to the data service. I'm wondering what would be the best way

Re: Spark 2.0 with Kafka 0.10 exception

2016-10-21 Thread Cody Koeninger
>> >> > >> >> >> >> > 16/09/07 16:00:00 INFO Executor: Running task 1.0 in stage >> >> >> >> > 138.0 >> >> >> >> > (TID >> >> >> >> > 7849) >> >>

Re: Spark 2.0 with Kafka 0.10 exception

2016-10-21 Thread Srikanth
>> >> > 16/09/07 16:00:00 INFO KafkaRDD: Computing topic mt_event, > >> >> >> > partition > >> >> >> > 0 > >> >> >> > offsets 57098866 -> 57109957 > >> >> >> > 16/09/07 16:00:00 INFO Executo

Re: Spark 2.0 with Kafka 0.10 exception

2016-10-20 Thread Cody Koeninger
sult sent to driver >> >> >> > 16/09/07 16:00:02 ERROR Executor: Exception in task 1.0 in stage >> >> >> > 138.0 >> >> >> > (TID >> >> >> > 7849) >> >> >> > java.lang.AssertionError: assertion fa

Re: Spark 2.0 with Kafka 0.10 exception

2016-10-20 Thread Srikanth
>> >> > > >> >> > > >> >> > org.apache.spark.streaming.kafka010.CachedKafkaConsumer. > get(CachedKafkaConsumer.scala:74) > >> >> > at > >> >> > > >> >> > > >> >> > org.ap

Re: Mlib RandomForest (Spark 2.0) predict a single vector

2016-10-20 Thread jglov
I would also like to know if there is a way to predict a single vector with the new spark.ml API, although in my case it's because I want to do this within a map() to avoid calling groupByKey() after a flatMap(): *Current code (pyspark):* % Given 'model', 'rdd', and a function 'split_element'

Re: Spark 2.0 with Kafka 0.10 exception

2016-10-19 Thread Cody Koeninger
t; org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:193) >> >> > >> >> > 16/09/07 16:00:02 INFO CoarseGrainedExecutorBackend: Got assigned >> >> > task >> >> > 7854 >> >> > 16/09/07 16:00:02 INFO Executor: Runni

Re: Spark 2.0 with Kafka 0.10 exception

2016-10-19 Thread Srikanth
r: Initial fetch for > >> > spark-executor-StreamingPixelCount1 mt_event 0 57098866 > >> > > >> > 16/09/07 16:00:03 INFO Executor: Finished task 1.1 in stage 138.0 (TID > >> > 7854). 1103 bytes result sent to driver > >> > > >> > > >

DataFrame/Dataset join not producing correct results in Spark 2.0/Yarn

2016-10-12 Thread shankinson
Hi, We have a cluster running Apache Spark 2.0 on Hadoop 2.7.2, Centos 7.2. We had written some new code using the Spark DataFrame/DataSet APIs but are noticing incorrect results on a join after writing and then reading data to Windows Azure Storage Blobs (The default HDFS location). I've been

DataFrame/Dataset join not producing correct results in Spark 2.0/Yarn

2016-10-12 Thread Stephen Hankinson
Hi, We have a cluster running Apache Spark 2.0 on Hadoop 2.7.2, Centos 7.2. We had written some new code using the Spark DataFrame/DataSet APIs but are noticing incorrect results on a join after writing and then reading data to Windows Azure Storage Blobs (The default HDFS location). I've been

Spark 2.0 Encoder().schema() is sorting StructFields

2016-10-12 Thread Paul Stewart
Hi all, I am using Spark 2.0 to read a CSV file into a Dataset in Java. This works fine if i define the StructType with the StructField array ordered by hand. What I would like to do is use a bean class for both the schema and Dataset row type. For example, Dataset beanDS = spark.read

Re: Manually committing offset in Spark 2.0 with Kafka 0.10 and Java

2016-10-11 Thread Cody Koeninger
one layer via rdd.rdd() >> >> If anyone wants to work on a PR to update the java examples in the >> docs for the 0-10 version, I'm happy to help. >> >> On Mon, Oct 10, 2016 at 10:34 AM, static-max <flasha...@googlemail.com> >> wrote: >> > Hi, >&

Re: Manually committing offset in Spark 2.0 with Kafka 0.10 and Java

2016-10-11 Thread static-max
via rdd.rdd() > > If anyone wants to work on a PR to update the java examples in the > docs for the 0-10 version, I'm happy to help. > > On Mon, Oct 10, 2016 at 10:34 AM, static-max <flasha...@googlemail.com> > wrote: > > Hi, > > > > by following this art

Re: Manually committing offset in Spark 2.0 with Kafka 0.10 and Java

2016-10-10 Thread Cody Koeninger
the java examples in the docs for the 0-10 version, I'm happy to help. On Mon, Oct 10, 2016 at 10:34 AM, static-max <flasha...@googlemail.com> wrote: > Hi, > > by following this article I managed to consume messages from Kafka 0.10 in > Spark 2.0: > http://spark.apache.org/

Manually committing offset in Spark 2.0 with Kafka 0.10 and Java

2016-10-10 Thread static-max
Hi, by following this article I managed to consume messages from Kafka 0.10 in Spark 2.0: http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html However, the Java examples are missing and I would like to commit the offset myself after processing the RDD. Does anybody have

Re: Package org.apache.spark.annotation no longer exist in Spark 2.0?

2016-10-04 Thread Jakob Odersky
2016 at 10:33 AM, Liren Ding <sky.gonna.bri...@gmail.com> wrote: > I just upgrade from Spark 1.6.1 to 2.0, and got an java compile error: > error: cannot access DeveloperApi > class file for org.apache.spark.annotation.DeveloperApi not found > > From the Spark 2.0 document &

Re: Pls assist: Spark 2.0 build failure on Ubuntu 16.06

2016-10-02 Thread Marco Mistroni
t; Try shutting down zinc. Something's funny about your compile server. > It's not required anyway. > > On Sat, Oct 1, 2016 at 3:24 PM, Marco Mistroni <mmistr...@gmail.com> > wrote: > > Hi guys > > sorry to annoy you on this but i am getting nowhere. So far i

Re: Pls assist: Spark 2.0 build failure on Ubuntu 16.06

2016-10-01 Thread Sean Owen
e. So far i have tried to > build spark 2.0 on my local laptop with no success so i blamed my > laptop poor performance > So today i fired off an EC2 Ubuntu 16.06 Instance and installed the > following (i copy paste commands here) > > ubuntu@ip-172-31-40-104:~/spark$ history >

Re: Pls assist: Spark 2.0 build failure on Ubuntu 16.06

2016-10-01 Thread Marco Mistroni
Hi guys sorry to annoy you on this but i am getting nowhere. So far i have tried to build spark 2.0 on my local laptop with no success so i blamed my laptop poor performance So today i fired off an EC2 Ubuntu 16.06 Instance and installed the following (i copy paste commands here) ubuntu@ip

Re: Pls assist: Spark 2.0 build failure on Ubuntu 16.06

2016-09-30 Thread Marco Mistroni
Hi all this problem is still bothering me. Here's my setup - Ubuntu 16.06 - Java 8 - Spark 2.0 - have launched following command: ./build/mvn -X -Pyarn -Phadoop-2.7 -DskipTests clean package and i am gettign this exception: org.apache.maven.lifecycle.LifecycleExecutionException: Failed

Re: Spark 2.0 issue

2016-09-29 Thread Xiao Li
Hi, Ashish, Will take a look at this soon. Thanks for reporting this, Xiao 2016-09-29 14:26 GMT-07:00 Ashish Shrowty : > If I try to inner-join two dataframes which originated from the same initial > dataframe that was loaded using spark.sql() call, it results in an

Spark 2.0 issue

2016-09-29 Thread Ashish Shrowty
If I try to inner-join two dataframes which originated from the same initial dataframe that was loaded using spark.sql() call, it results in an error - // reading from Hive .. the data is stored in Parquet format in Amazon S3 val d1 = spark.sql("select * from ") val df1 =

Question about single/multi-pass execution in Spark-2.0 dataset/dataframe

2016-09-27 Thread Spark User
is large in the order of millions of records per batch 3) I'm using spark 2.0 The above implementation doesn't seem to be efficient at all, if data set goes through the Rows for every count aggregation for computing attr1Counts, attr2Counts and attr3Counts. I'm concerned about the performance

Re: Tutorial error - zeppelin 0.6.2 built with spark 2.0 and mapr

2016-09-26 Thread Nirav Patel
FYI, it works when I use MapR configured Spark 2.0. ie export SPARK_HOME=/opt/mapr/spark/spark-2.0.0-bin-without-hadoop Thanks Nirav On Mon, Sep 26, 2016 at 3:45 PM, Nirav Patel <npa...@xactlycorp.com> wrote: > Hi, > > I built zeppeling 0.6 branch using spark 2.0 usin

Tutorial error - zeppelin 0.6.2 built with spark 2.0 and mapr

2016-09-26 Thread Nirav Patel
Hi, I built zeppeling 0.6 branch using spark 2.0 using following mvn : mvn clean package -Pbuild-distr -Pmapr41 -Pyarn -Pspark-2.0 -Pscala-2.11 -DskipTests Built went successful. I only have following set in zeppelin-conf.sh export HADOOP_HOME=/opt/mapr/hadoop/hadoop-2.5.1/ export

  1   2   3   4   5   >