Re: Question / issue while creating a parquet file using a text file with spark 2.0...

2016-07-28 Thread Muthu Jayakumar
Hello Dong Meng, Thanks for the tip. But, I do have code in place that looks like this... StructField(columnName, getSparkDataType(dataType), nullable = true) May be I am missing something else. The same code works fine with Spark 1.6.2 though. On a side note, I could be using SparkSession, but

Re: Question / issue while creating a parquet file using a text file with spark 2.0...

2016-07-28 Thread Dong Meng
you can specify nullable in StructField On Thu, Jul 28, 2016 at 9:14 PM, Muthu Jayakumar wrote: > Hello there, > > I am using Spark 2.0.0 to create a parquet file using a text file with > Scala. I am trying to read a text file with bunch of values of type string > and long

Re: Spark 2.0 Build Failed

2016-07-28 Thread Ascot Moss
I just run wget https://repo1.maven.org/maven2/org/apache/apache/14/apache-14.pom, can get it without issue. On Fri, Jul 29, 2016 at 1:44 PM, Ascot Moss wrote: > Hi thanks! > > mvn dependency:tree > > [INFO] Scanning for projects... > > Downloading: >

Re: Custom Image RDD and Sequence Files

2016-07-28 Thread Jörn Franke
Why don't you write your own Hadoop FileInputFormat. It can be used by Spark... > On 28 Jul 2016, at 20:04, jtgenesis wrote: > > Hey all, > > I was wondering what the best course of action is for processing an image > that has an involved internal structure (file headers,

Re: Spark 2.0 Build Failed

2016-07-28 Thread Ascot Moss
Hi thanks! mvn dependency:tree [INFO] Scanning for projects... Downloading: https://repo1.maven.org/maven2/org/apache/apache/14/apache-14.pom [ERROR] [ERROR] Some problems were encountered while processing the POMs: [FATAL] Non-resolvable parent POM for

Re: Spark 2.0 Build Failed

2016-07-28 Thread Dong Meng
Before build, first do a "mvn dependency:tree" to make sure the dependency is right On Thu, Jul 28, 2016 at 10:18 PM, Ascot Moss wrote: > Thanks for your reply. > > Is there a way to find the correct Hadoop profile name? > > On Fri, Jul 29, 2016 at 7:06 AM, Sean Owen

Re: Spark 2.0 Build Failed

2016-07-28 Thread Ascot Moss
Thanks for your reply. Is there a way to find the correct Hadoop profile name? On Fri, Jul 29, 2016 at 7:06 AM, Sean Owen wrote: > You have at least two problems here: wrong Hadoop profile name, and > some kind of firewall interrupting access to the Maven repo. It's not >

Question / issue while creating a parquet file using a text file with spark 2.0...

2016-07-28 Thread Muthu Jayakumar
Hello there, I am using Spark 2.0.0 to create a parquet file using a text file with Scala. I am trying to read a text file with bunch of values of type string and long (mostly). And all the occurrences can be null. In order to support nulls, all the values are boxed with Option (ex:-

Re: Writing custom Transformers and Estimators like Tokenizer in spark ML

2016-07-28 Thread Phuong LE-HONG
Hi, I've developed a simple ML estimator (in Java) that implements conditional Markov model for sequence labelling in Vitk toolkit. You can check it out here: https://github.com/phuonglh/vn.vitk/blob/master/src/main/java/vn/vitk/tag/CMM.java Phuong Le-Hong On Fri, Jul 29, 2016 at 9:01 AM,

Re: spark run shell On yarn

2016-07-28 Thread Marcelo Vanzin
Well, it's more of an unfortunate incompatibility caused by dependency hell. There's a YARN issue to make this better by avoiding that code path when it's not needed, but I'm not sure what's the status of that. On Thu, Jul 28, 2016 at 6:54 PM, censj wrote: > ok ! solved !! >

Re: Writing custom Transformers and Estimators like Tokenizer in spark ML

2016-07-28 Thread janardhan shetty
Thanks Steve. Any pointers to custom estimators development as well ? On Wed, Jul 27, 2016 at 11:35 AM, Steve Rowe wrote: > You can see the source for my transformer configurable bridge to Lucene > analysis components here, in my company Lucidworks’ spark-solr project: < >

Re: spark run shell On yarn

2016-07-28 Thread Marcelo Vanzin
You can probably do that in Spark's conf too: spark.hadoop.yarn.timeline-service.enabled=false On Thu, Jul 28, 2016 at 5:13 PM, Jeff Zhang wrote: > One workaround is disable timeline in yarn-site, > > set yarn.timeline-service.enabled as false in yarn-site.xml > > On Thu, Jul

Re: spark run shell On yarn

2016-07-28 Thread Jeff Zhang
One workaround is disable timeline in yarn-site, set yarn.timeline-service.enabled as false in yarn-site.xml On Thu, Jul 28, 2016 at 5:31 PM, censj wrote: > 16/07/28 17:07:34 WARN shortcircuit.DomainSocketFactory: The short-circuit > local reads feature cannot be used

Re: Role-based S3 access outside of EMR

2016-07-28 Thread Everett Anderson
Hey, Just wrapping this up -- I ended up following the instructions to build a custom Spark release with Hadoop 2.7.2, stealing from Steve's SPARK-7481 PR a bit, in order to get Spark 1.6.2 + Hadoop 2.7.2 + the hadoop-aws library (which

Re: Spark 2.0 Build Failed

2016-07-28 Thread Sean Owen
You have at least two problems here: wrong Hadoop profile name, and some kind of firewall interrupting access to the Maven repo. It's not related to Spark. On Thu, Jul 28, 2016 at 4:04 PM, Ascot Moss wrote: > Hi, > > I tried to build spark, > > (try 1) > mvn -Pyarn

Spark 2.0 Build Failed

2016-07-28 Thread Ascot Moss
Hi, I tried to build spark, (try 1) mvn -Pyarn *-Phadoop-2.7.0* *-Dscala-2.11* -Dhadoop.version=2.7.0 -Phive -Phive-thriftserver -DskipTests clean package [INFO] Spark Project Parent POM ... FAILURE [ 0.658 s] [INFO] Spark Project Tags .

Re: spark 1.6.0 read s3 files error.

2016-07-28 Thread freedafeng
tried the following. still failed the same way.. it ran in yarn. cdh5.8.0 from pyspark import SparkContext, SparkConf conf = SparkConf().setAppName('s3 ---') sc = SparkContext(conf=conf) sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", "...")

Re: ORC v/s Parquet for Spark 2.0

2016-07-28 Thread Alexander Pivovarov
Found 0 matching posts for *ORC v/s Parquet for Spark 2.0* in Apache Spark User List http://apache-spark-user-list.1001560.n3.nabble.com/ Anyone have a link to this discussion? Want to share it with my colleagues. On Thu, Jul 28, 2016 at

Re: ORC v/s Parquet for Spark 2.0

2016-07-28 Thread Mich Talebzadeh
As far as I know Spark still lacks the ability to handle Updates or deletes vis-à-vis ORC transactional tables. As you may know in Hive an ORC transactional table can handle updates and deletes. Transactional support was added to Hive for ORC tables. No transactional support with Spark SQL on ORC

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Rohit Chaddha
After looking at the comments - I am not sure what the proposed fix is ? On Fri, Jul 29, 2016 at 12:47 AM, Sean Owen wrote: > Ah, right. This wasn't actually resolved. Yeah your input on 15899 > would be welcome. See if the proposed fix helps. > > On Thu, Jul 28, 2016 at

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Sean Owen
Ah, right. This wasn't actually resolved. Yeah your input on 15899 would be welcome. See if the proposed fix helps. On Thu, Jul 28, 2016 at 11:52 AM, Rohit Chaddha wrote: > Sean, > > I saw some JIRA tickets and looks like this is still an open bug (rather > than an

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Rohit Chaddha
Sean, I saw some JIRA tickets and looks like this is still an open bug (rather than an improvement as marked in JIRA). https://issues.apache.org/jira/browse/SPARK-15893 https://issues.apache.org/jira/browse/SPARK-15899 I am experimenting, but do you know of any solution on top of your head

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
Solved!! The solution is using date_format with the “u” option. Thank you very much. Best, Carlo On 28 Jul 2016, at 18:59, carlo allocca > wrote: Hi Mark, Thanks for the suggestion. I changed the maven entries as follows spark-core_2.10

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Rohit Chaddha
I am simply trying to do session.read().json("file:///C:/data/a.json"); in 2.0.0-preview it was working fine with sqlContext.read().json("C:/data/a.json"); -Rohit On Fri, Jul 29, 2016 at 12:03 AM, Sean Owen wrote: > Hm, file:///C:/... doesn't work? that should certainly

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Hatim Diab
I’m not familiar with windows but for unix is the path is /data/zxy then it’ll be file:///data/zxy so I’d assume file://C:/ > On Jul 28, 2016, at 2:33 PM, Sean Owen wrote: > > Hm, file:///C:/... doesn't work? that should certainly be an absolute > URI with an absolute

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Sean Owen
Hm, file:///C:/... doesn't work? that should certainly be an absolute URI with an absolute path. What exactly is your input value for this property? On Thu, Jul 28, 2016 at 11:28 AM, Rohit Chaddha wrote: > Hello Sean, > > I have tried both file:/ and file:/// > Bit

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Rohit Chaddha
Hello Sean, I have tried both file:/ and file:/// Bit it does not work and give the same error -Rohit On Thu, Jul 28, 2016 at 11:51 PM, Sean Owen wrote: > IIRC that was fixed, in that this is actually an invalid URI. Use > file:/C:/... I think. > > On Thu, Jul 28, 2016

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Sean Owen
IIRC that was fixed, in that this is actually an invalid URI. Use file:/C:/... I think. On Thu, Jul 28, 2016 at 10:47 AM, Rohit Chaddha wrote: > I upgraded from 2.0.0-preview to 2.0.0 > and I started getting the following error > > Caused by:

Re: ClassTag variable in broadcast in spark 2.0 ? how to use

2016-07-28 Thread Rohit Chaddha
My bad. Please ignore this question. I accidentally reverted to sparkContext causing the issue On Thu, Jul 28, 2016 at 11:36 PM, Rohit Chaddha wrote: > In spark 2.0 there is an addtional parameter of type ClassTag in the > broadcast method of the sparkContext > >

ClassTag variable in broadcast in spark 2.0 ? how to use

2016-07-28 Thread Rohit Chaddha
In spark 2.0 there is an addtional parameter of type ClassTag in the broadcast method of the sparkContext What is this variable and how to do broadcast now? here is my exisitng code with 2.0.0-preview Broadcast> b = jsc.broadcast(u.collectAsMap()); what changes needs to be

Custom Image RDD and Sequence Files

2016-07-28 Thread jtgenesis
Hey all, I was wondering what the best course of action is for processing an image that has an involved internal structure (file headers, sub-headers, image data, more sub-headers, more kinds of data etc). I was hoping to get some insight on the approach I'm using and whether there is a better,

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
Hi Mark, Thanks for the suggestion. I changed the maven entries as follows spark-core_2.10 2.0.0 and spark-sql_2.10 2.0.0 As result, it worked when I removed the following line of code to compute DAYOFWEEK (Monday—>1 etc.): Dataset

Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Rohit Chaddha
I upgraded from 2.0.0-preview to 2.0.0 and I started getting the following error Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:C:/ibm/spark-warehouse Any ideas how to fix this -Rohit

Re: ORC v/s Parquet for Spark 2.0

2016-07-28 Thread Ofir Manor
BTW - this thread has many anecdotes on Apache ORC vs. Apache Parquet (I personally think both are great at this point). But the original question was about Spark 2.0. Anyone has some insights about Parquet-specific optimizations / limitations vs. ORC-specific optimizations / limitations in

Re: RDD vs Dataset performance

2016-07-28 Thread Reynold Xin
The performance difference is coming from the need to serialize and deserialize data to AnnotationText. The extra stage is probably very quick and shouldn't impact much. If you try cache the RDD using serialized mode, it would slow down a lot too. On Thu, Jul 28, 2016 at 9:52 AM, Darin McBeath

Re: Unable to create a dataframe from json dstream using pyspark

2016-07-28 Thread Sunil Kumar Chinnamgari
Hi, I am attempting to create a dataframe from json in dstream but the code below does not seem to help get the dataframe right - import sysimport jsonfrom pyspark import SparkContextfrom pyspark.streaming import StreamingContextfrom pyspark.sql import SQLContextdef

RDD vs Dataset performance

2016-07-28 Thread Darin McBeath
I started playing round with Datasets on Spark 2.0 this morning and I'm surprised by the significant performance difference I'm seeing between an RDD and a Dataset for a very basic example. I've defined a simple case class called AnnotationText that has a handful of fields. I create a

Re: spark 1.6.0 read s3 files error.

2016-07-28 Thread freedafeng
BTW, I also tried yarn. Same error. When I ran the script, I used the real credentials for s3, which is omitted in this post. sorry about that. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-6-0-read-s3-files-error-tp27417p27425.html Sent from

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Mark Hamstra
Don't use Spark 2.0.0-preview. That was a preview release with known issues, and was intended to be used only for early, pre-release testing purpose. Spark 2.0.0 is now released, and you should be using that. On Thu, Jul 28, 2016 at 3:48 AM, Carlo.Allocca wrote: >

Re: spark 1.6.0 read s3 files error.

2016-07-28 Thread Andy Davidson
Hi Freedafeng Can you tells a little more? I.E. Can you paste your code and error message? Andy From: freedafeng Date: Thursday, July 28, 2016 at 9:21 AM To: "user @spark" Subject: Re: spark 1.6.0 read s3 files error. > The question is, what

Re: spark 1.6.0 read s3 files error.

2016-07-28 Thread freedafeng
The question is, what is the cause of the problem? and how to fix it? Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-6-0-read-s3-files-error-tp27417p27424.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: performance problem when reading lots of small files created by spark streaming.

2016-07-28 Thread Gourav Sengupta
There is an option to join small files up. If you are unable to find it just let me know. Regards, Gourav On Thu, Jul 28, 2016 at 4:58 PM, Andy Davidson < a...@santacruzintegration.com> wrote: > Hi Pedro > > Thanks for the explanation. I started watching your repo. In the short > term I think

Re: Spark 2.0 - JavaAFTSurvivalRegressionExample doesn't work

2016-07-28 Thread Bryan Cutler
That's the correct fix. I have this done along with a few other Java examples that still use the old MLlib Vectors in this PR thats waiting for review https://github.com/apache/spark/pull/14308 On Jul 28, 2016 5:14 AM, "Robert Goodman" wrote: > I changed import in the sample

Re: performance problem when reading lots of small files created by spark streaming.

2016-07-28 Thread Andy Davidson
Hi Pedro Thanks for the explanation. I started watching your repo. In the short term I think I am going to try concatenating my small files into 64MB and using HDFS. My spark streaming app is implemented Java and uses data frames. It writes to s3. My batch processing is written in python It reads

Re: ORC v/s Parquet for Spark 2.0

2016-07-28 Thread Mich Talebzadeh
Like anything else your mileage varies. ORC with Vectorised query execution is the nearest one can get to proper Data Warehouse like SAP IQ or Teradata with columnar indexes. To me that is cool. Parquet has been around

Re: Guys is this some form of Spam or someone has left his auto-reply loose LOL

2016-07-28 Thread Mich Talebzadeh
He says he is enjoying his holidays. Do we want to disturb him? :) Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: Guys is this some form of Spam or someone has left his auto-reply loose LOL

2016-07-28 Thread Pedro Rodriguez
Same here, but maybe this is a really urgent matter we need to contact him about... or just make a filter On Thu, Jul 28, 2016 at 7:59 AM, Mich Talebzadeh wrote: > > -- Forwarded message -- > From: Geert Van Landeghem [Napoleon Games NV] < >

Pls assist: need to create an udf that returns a LabeledPoint in pyspark

2016-07-28 Thread Marco Mistroni
hi all could anyone assist? i need to create a udf function that returns a LabeledPoint I read that in pyspark (1.6) LabeledPoint is not supported and i have to create a StructType anyone can point me in some directions? kr marco

Guys is this some form of Spam or someone has left his auto-reply loose LOL

2016-07-28 Thread Mich Talebzadeh
-- Forwarded message -- From: Geert Van Landeghem [Napoleon Games NV] < g.vanlandeg...@napoleongames.be> Date: 28 July 2016 at 14:38 Subject: Re: Re: Is spark-1.6.1-bin-2.6.0 compatible with hive-1.1.0-cdh5.7.1 To: Mich Talebzadeh Hello, I am enjoying

Re: Is spark-1.6.1-bin-2.6.0 compatible with hive-1.1.0-cdh5.7.1

2016-07-28 Thread Mich Talebzadeh
Ok does it create a derby database and comes back to prompt? For example does spark-sql work OK. If it cannot find the metastore it will create an empty derby database in the same directory and at prompt you can type show databases; and that will only show default! HTH Dr Mich Talebzadeh

Re: Spark 2.0 - JavaAFTSurvivalRegressionExample doesn't work

2016-07-28 Thread Robert Goodman
I changed import in the sample from import org.apache.spark.mllib.linalg.*; to import org.apache.spark.ml.linalg.*; and the sample now runs. Thanks Bob On Wed, Jul 27, 2016 at 1:33 PM, Robert Goodman wrote: > I tried to run the

Re: Is spark-1.6.1-bin-2.6.0 compatible with hive-1.1.0-cdh5.7.1

2016-07-28 Thread Mohammad Tariq
Hi Mich, Thank you so much for the prompt response! I do have a copy of hive-site.xml in spark conf directory. On Thursday, July 28, 2016, Mich Talebzadeh wrote: > Hi, > > This line > > 2016-07-28 04:36:01,814] INFO Property hive.metastore.integral.jdo.pushdown >

Re: Is spark-1.6.1-bin-2.6.0 compatible with hive-1.1.0-cdh5.7.1

2016-07-28 Thread Mich Talebzadeh
Hi, This line 2016-07-28 04:36:01,814] INFO Property hive.metastore.integral.jdo.pushdown unknown - will be ignored (DataNucleus.Persistence:77) telling me that you do don't seem to have the softlink to hive-site.xml in $SPARK_HOME/conf hive-site.xml -> /usr/lib/hive/conf/hive-site.xml I

reasons for introducing SPARK-9415 - disable group by on MapType

2016-07-28 Thread Tomasz Bartczak
Hello, what were the reasons for disabling group-by and join on column of MapType? It was done intentionally in https://issues.apache.org/jira/browse/SPARK-9415 I understand that it would not be feasible to do order by MapType, but - group by requires only equality check, which is possible for

Is spark-1.6.1-bin-2.6.0 compatible with hive-1.1.0-cdh5.7.1

2016-07-28 Thread Mohammad Tariq
Could anyone please help me with this? I have been using the same version of Spark with CDH-5.4.5 successfully so far. However after a recent CDH upgrade I'm not able to run the same Spark SQL module against hive-1.1.0-cdh5.7.1. When I try to run my program Spark tries to connect to local derby

Re: Run times for Spark 1.6.2 compared to 2.1.0?

2016-07-28 Thread Colin Beckingham
On 27/07/16 16:31, Colin Beckingham wrote: I have a project which runs fine in both Spark 1.6.2 and 2.1.0. It calculates a logistic model using MLlib. I compiled the 2.1 today from source and took the version 1 as a precompiled version with Hadoop. The odd thing is that on 1.6.2 the project

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
and, of course I am using org.apache.spark spark-core_2.11 2.0.0-preview org.apache.spark spark-sql_2.11 2.0.0-preview jar Is the below problem/issue related to the

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
I have also found the following two related links: 1) https://github.com/apache/spark/commit/947b9020b0d621bc97661a0a056297e6889936d3 2) https://github.com/apache/spark/pull/12433 which both explain why it happens but nothing about what to do to solve it. Do you have any

Re: Fail a batch in Spark Streaming forcefully based on business rules

2016-07-28 Thread Hemalatha A
Another usecase why I need to do this is, If Exception A is caught I should just print it and ignore, but ifException B occurs, I have to end the batch, fail it and stop processing the batch. Is it possible to achieve this?? Any hints on this please. On Wed, Jul 27, 2016 at 10:42 AM, Hemalatha A

Re: create external table from partitioned avro file

2016-07-28 Thread Gourav Sengupta
Why avro? Regards, Gourav Sengupta On Thu, Jul 28, 2016 at 8:15 AM, Yang Cao wrote: > Hi, > > I am using spark 1.6 and I hope to create a hive external table based on > one partitioned avro file. Currently, I don’t find any build-in api to do > this work. I tried the

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
Hi Rui, Thanks for the promptly reply. No, I am not using Mesos. Ok. I am writing a code to build a suitable dataset for my needs as in the following: == Session configuration: SparkSession spark = SparkSession .builder() .master("local[6]") //

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Sun Rui
Are you using Mesos? if not , https://issues.apache.org/jira/browse/SPARK-16522 is not relevant You may describe more information about your Spark environment, and the full stack trace. > On Jul 28, 2016, at 17:44, Carlo.Allocca

Re: Spark Standalone Cluster: Having a master and worker on the same node

2016-07-28 Thread Chanh Le
Hi Jestin, I saw most of setup usually setup along master and slave in a same node. Because I think master doesn't do as much job as slave does and resource is expensive we need to use it. BTW In my setup I setup along master and slave. I have 5 nodes and 3 of which are master and slave running

Re: saveAsTextFile at treeEnsembleModels.scala:447, took 2.513396 s Killed

2016-07-28 Thread Ascot Moss
Hi, Thanks for your reply. permissions (access) is not an issue in my case, it is because this issue only happened when the bigger input file was used to generate the model, i.e. with smaller input(s) all worked well. It seems to me that ".save" cannot save big file. Q1: Any idea about the

SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
Hi All, I am running SPARK locally, and when running d3=join(d1,d2) and d5=(d3, d4) am getting the following exception "org.apache.spark.SparkException: Exception thrown in awaitResult”. Googling for it, I found that the closed is the answer reported

Re:Re: Spark 2.0 on YARN - Dynamic Resource Allocation Behavior change?

2016-07-28 Thread LONG WANG
Thanks for your reply, I have tried your suggestion. it works now. At 2016-07-28 16:18:01, "Sun Rui" wrote: Yes, this is a change in Spark 2.0. you can take a look at https://issues.apache.org/jira/browse/SPARK-13723 In the latest Spark On Yarn documentation for Spark

Re: Spark Thrift Server 2.0 set spark.sql.shuffle.partitions not working when query

2016-07-28 Thread Chanh Le
Thank you Takeshi it works fine now. Regards, Chanh > On Jul 28, 2016, at 2:03 PM, Takeshi Yamamuro wrote: > > Hi, > > you need to set the value when you just start the server. > > // maropu > > On Thu, Jul 28, 2016 at 3:59 PM, Chanh Le

Materializing mapWithState .stateSnapshot() after ssc.stop

2016-07-28 Thread Ben Teeuwen
Hi all, I’ve posted a question regarding sessionizing events using scala and mapWithState at http://stackoverflow.com/questions/38541958/materialize-mapwithstate-statesnapshots-to-database-for-later-resume-of-spark-st

Re: Spark 2.0 on YARN - Dynamic Resource Allocation Behavior change?

2016-07-28 Thread Sun Rui
Yes, this is a change in Spark 2.0. you can take a look at https://issues.apache.org/jira/browse/SPARK-13723 In the latest Spark On Yarn documentation for Spark 2.0, there is

Re: ORC v/s Parquet for Spark 2.0

2016-07-28 Thread Jörn Franke
I see it more as a process of innovation and thus competition is good. Companies just should not follow these religious arguments but try themselves what suits them. There is more than software when using software ;) > On 28 Jul 2016, at 01:44, Mich Talebzadeh wrote:

Re: Any reference of performance tuning on SparkSQL?

2016-07-28 Thread Sonal Goyal
I found some references at http://spark.apache.org/docs/latest/sql-programming-guide.html#performance-tuning http://apache-spark-user-list.1001560.n3.nabble.com/Performance-tuning-in-Spark-SQL-td21871.html HTH Best Regards, Sonal Founder, Nube Technologies Reifier at

Spark 2.0 on YARN - Dynamic Resource Allocation Behavior change?

2016-07-28 Thread LONG WANG
Hi Spark Experts, Today I tried Spark 2.0 on YARN and also enabled Dynamic Resource Allocation feature, I just find that no matter I specify --num-executor in spark-submit command or not, the Dynamic Resource Allocation is used, but I remember when I specify

create external table from partitioned avro file

2016-07-28 Thread Yang Cao
Hi, I am using spark 1.6 and I hope to create a hive external table based on one partitioned avro file. Currently, I don’t find any build-in api to do this work. I tried the write.format().saveAsTable, with format com.databricks.spark.avro. it returned error can’t file Hive serde for this.

Any reference of performance tuning on SparkSQL?

2016-07-28 Thread Linyuxin
Hi ALL Is there any reference of performance tuning on SparkSQL? I can only find about turning on spark core on http://spark.apache.org/

Re: Possible to push sub-queries down into the DataSource impl?

2016-07-28 Thread Takeshi Yamamuro
Hi, Have you seen this ticket? https://issues.apache.org/jira/browse/SPARK-12449 // maropu On Thu, Jul 28, 2016 at 2:13 AM, Timothy Potter wrote: > I'm not looking for a one-off solution for a specific query that can > be solved on the client side as you suggest, but