Re: Read Local File

2017-06-14 Thread Dirceu Semighini Filho
lright. Typing the path explicitly resolved it. But > this is a corner case. > > Alternately - if the file size is small, you could do spark-submit with a > --files option which will ship the file to every executor and is available > for all executors. > > > > > O

Read Local File

2017-06-13 Thread Dirceu Semighini Filho
Hi all, I'm trying to read a File from local filesystem, I'd 4 workstations 1 Master and 3 slaves, running with Ambari and Yarn with Spark version* 2.1.1.2.6.1.0-129* The code that I'm trying to run is quite simple spark.sqlContext.read.text("file:///pathToFile").count I've copied the file in

Re: Why does dataset.union fails but dataset.rdd.union execute correctly?

2017-05-12 Thread Dirceu Semighini Filho
ct * from sample").as[A] > df.union(df1) > > It runs ok. And for nullabillity I thought that issue has been fixed: > https://issues.apache.org/jira/browse/SPARK-18058 > I think you can check your spark version and schema of dataset again? Hope > this help. > > Best, &

Re: Why does dataset.union fails but dataset.rdd.union execute correctly?

2017-05-08 Thread Dirceu Semighini Filho
his should actually be fixed, and the union's schema > should have the less restrictive of the DataFrames. > > On Mon, May 8, 2017 at 12:46 PM, Dirceu Semighini Filho < > dirceu.semigh...@gmail.com> wrote: > >> HI Burak, >> By nullability you mean that if I have the e

Re: Why does dataset.union fails but dataset.rdd.union execute correctly?

2017-05-08 Thread Dirceu Semighini Filho
compatible column problem. For >> RDD, You may not see any error if you don't use the incompatible column. >> >> Dataset.union requires compatible schema. You can print ds.schema and >> ds1.schema and check if they are same. >> >> On Mon, May 8, 2017 at 11:07 A

Why does dataset.union fails but dataset.rdd.union execute correctly?

2017-05-08 Thread Dirceu Semighini Filho
Hello, I've a very complex case class structure, with a lot of fields. When I try to union two datasets of this class, it doesn't work with the following error : ds.union(ds1) Exception in thread "main" org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the

Cant convert Dataset to case class with Option fields

2017-04-07 Thread Dirceu Semighini Filho
Hi Devs, I've some case classes here, and it's fields are all optional case class A(b:Option[B] = None, c: Option[C] = None, ...) If I read some data in a DataSet and try to connvert it to this case class using the as method, it doesn't give me any answer, it simple freeze. If I change the case

Re: specifing schema on dataframe

2017-02-04 Thread Dirceu Semighini Filho
Hi Sam Remove the " from the number that it will work Em 4 de fev de 2017 11:46 AM, "Sam Elamin" escreveu: > Hi All > > I would like to specify a schema when reading from a json but when trying > to map a number to a Double it fails, I tried FloatType and IntType with

Re: Time-Series Analysis with Spark

2017-01-11 Thread Dirceu Semighini Filho
Hello Rishabh, We have done some forecasting, for time-series, using ARIMA in our project, it's on top of Spark and it's open source https://github.com/eleflow/uberdata Kind Regards, Dirceu 2017-01-11 8:20 GMT-02:00 Sean Owen : > https://github.com/sryza/spark-timeseries ? >

Re: How many Spark streaming applications can be run at a time on a Spark cluster?

2016-12-24 Thread Dirceu Semighini Filho
Hi, You can start multiple spark apps per cluster. You will have one stream context per app. Em 24 de dez de 2016 18:22, "shyla deshpande" escreveu: > Hi All, > > Thank you for the response. > > As per > > https://docs.cloud.databricks.com/docs/latest/databricks_ >

Re: coalesce ending up very unbalanced - but why?

2016-12-14 Thread Dirceu Semighini Filho
change anything. Out of curiousity why did you suggest that? > Googling "spark coalesce prime" doesn't give me any clue :-) > Adrian > > > On 14/12/2016 13:58, Dirceu Semighini Filho wrote: > > Hi Adrian, > Which kind of partitioning are you using? > Have you alread

Re: coalesce ending up very unbalanced - but why?

2016-12-14 Thread Dirceu Semighini Filho
Hi Adrian, Which kind of partitioning are you using? Have you already tried to coalesce it to a prime number? 2016-12-14 11:56 GMT-02:00 Adrian Bridgett : > I realise that coalesce() isn't guaranteed to be balanced and adding a > repartition() does indeed fix this (at the

Re: Spark Streaming Data loss on failure to write BlockAdditionEvent failure to WAL

2016-11-17 Thread Dirceu Semighini Filho
w file is > created to store the block addition event. I need to look into the code > again to see when these files are created new and when they are appended. > > > Thanks, Arijit > > > -- > *From:* Dirceu Semighini Filho <dirceu.semigh...@gma

Re: Spark Streaming Data loss on failure to write BlockAdditionEvent failure to WAL

2016-11-17 Thread Dirceu Semighini Filho
Hi Arijit, Have you find a solution for this? I'm facing the same problem in Spark 1.6.1, but here the error happens only a few times, so our hdfs does support append. This is what I can see in the logs: 2016-11-17 13:43:20,012 ERROR [BatchedWriteAheadLog Writer] WriteAheadLogManager for Thread:

Re: Writing parquet table using spark

2016-11-16 Thread Dirceu Semighini Filho
Hello, Have you configured this property? spark.sql.parquet.compression.codec 2016-11-16 6:40 GMT-02:00 Vaibhav Sinha : > Hi, > I am using hiveContext.sql() method to select data from source table and > insert into parquet tables. > The query executed from spark takes

Can somebody remove this guy?

2016-09-23 Thread Dirceu Semighini Filho
Can somebody remove this guy from the list tod...@yahoo-inc.com Just sent a message to the list and received an mail from yahoo saying that this email doesn't exist anymore. This is an automatically generated message. tod...@yahoo-inc.com is no longer with Yahoo! Inc. Your message will not be

Re: 答复: 答复: it does not stop at breakpoints which is in an anonymous function

2016-09-23 Thread Dirceu Semighini Filho
val x = random * 2 - 1 (breakpoint-1) > val y = random * 2 - 1 > if (x*x + y*y < 1) 1 else 0 > }.reduce(_ + _) > println("Pi is roughly " + 4.0 * count / (n - 1)) > spark.stop() > } > } > > > > >

Re: 答复: it does not stop at breakpoints which is in an anonymous function

2016-09-16 Thread Dirceu Semighini Filho
the evaluation. > > Thanks you very much > ------ > *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com> > *发送时间:* 2016年9月16日 21:07 > *收件人:* chen yong > *抄送:* user@spark.apache.org > *主题:* Re: 答复: 答复: 答复: 答复: t it does not stop at breakpoints which is in

Re: 答复: 答复: 答复: 答复: t it does not stop at breakpoints which is in an anonymous function

2016-09-16 Thread Dirceu Semighini Filho
evious replies. > > Later, I guess > > the line > > val test = count > > is the key point. without it, it would not stop at the breakpont-1, right? > > > > -- > *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com> > *发送时间:* 2016年

Re: 答复: 答复: 答复: t it does not stop at breakpoints which is in an anonymous function

2016-09-15 Thread Dirceu Semighini Filho
his line) > if (x*x + y*y < 1) 1 else 0 > }.reduce(_ + _) > val test = x (breakpoint-2 set in this line) > > > > -- > *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com> > *发送时间:* 2016年9月14日 23:32 > *收件人:* chen yong

Re: t it does not stop at breakpoints which is in an anonymous function

2016-09-14 Thread Dirceu Semighini Filho
Hello Felix, Spark functions run lazy, and that's why it doesn't stop in those breakpoints. They will be executed only when you call some methods of your dataframe/rdd, like the count, collect, ... Regards, Dirceu 2016-09-14 11:26 GMT-03:00 chen yong : > Hi all, > > > > I am

Re: Forecasting algorithms in spark ML

2016-09-08 Thread Dirceu Semighini Filho
Hi Madabhattula Rajesh Kumar, There is an open source project called sparkts (Time Series for Spark) that implement ARIMA and Holtwinters algorithms on top of Spark, which can be used for forecast. In some cases, Linear Regression, which is avalilable

Re: Debug spark jobs on Intellij

2016-05-31 Thread Dirceu Semighini Filho
Try this: Is this python right? I'm not used to it, I'm used to scala, so val toDebug = rdd.foreachPartition(partition -> { //breakpoint stop here *// by val toDebug I mean to assign the result of foreachPartition to a variable* partition.forEachRemaining(message -> { //breakpoint

Re: Debug spark jobs on Intellij

2016-05-31 Thread Dirceu Semighini Filho
Hi Marcelo, this is because the operations in rdd are lazy, you will only stop at this inside foreach breakpoint when you call a first, a collect or a reduce operation. This is when the spark will run the operations. Have you tried that? Cheers. 2016-05-31 17:18 GMT-03:00 Marcelo Oikawa

Re: ClassNotFoundException in RDD.map

2016-03-23 Thread Dirceu Semighini Filho
at normally the typechecker > could catch, can slip through. > > On Thu, Mar 17, 2016 at 10:25 AM, Dirceu Semighini Filho > <dirceu.semigh...@gmail.com> wrote: > > Hi Ted, thanks for answering. > > The map is just that, whenever I try inside the map it throws this >

Re: Serialization issue with Spark

2016-03-23 Thread Dirceu Semighini Filho
Hello Hafsa, TaskNotSerialized exception usually means that you are trying to use an object, defined in the driver, in code that runs on workers. Can you post the code that ir generating this error here, so we can better advise you? Cheers. 2016-03-23 14:14 GMT-03:00 Hafsa Asif

ClassNotFoundException in RDD.map

2016-03-20 Thread Dirceu Semighini Filho
Hello, I found a strange behavior after executing a prediction with MLIB. My code return an RDD[(Any,Double)] where Any is the id of my dataset, which is BigDecimal, and Double is the prediction for that line. When I run myRdd.take(10) it returns ok res16: Array[_ >: (Double, Double) <: (Any,

Re: ClassNotFoundException in RDD.map

2016-03-19 Thread Dirceu Semighini Filho
ode isn't wrong. Kind Regards, Dirceu 2016-03-17 12:50 GMT-03:00 Ted Yu <yuzhih...@gmail.com>: > bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 > > Do you mind showing more of your code involving the map() ? > > On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini

Re: SparkR Count vs Take performance

2016-03-02 Thread Dirceu Semighini Filho
n Owen [mailto:so...@cloudera.com] > Sent: Wednesday, March 2, 2016 3:37 AM > To: Dirceu Semighini Filho <dirceu.semigh...@gmail.com> > Cc: user <user@spark.apache.org> > Subject: Re: SparkR Count vs Take performance > > Yeah one surprising result is that you can't call i

Re: SparkR Count vs Take performance

2016-03-01 Thread Dirceu Semighini Filho
it's slower than > a count in all but pathological cases. > > > > On Tue, Mar 1, 2016 at 6:03 PM, Dirceu Semighini Filho > <dirceu.semigh...@gmail.com> wrote: > > Hello all. > > I have a script that create a dataframe from this operation: > > > > m

SparkR Count vs Take performance

2016-03-01 Thread Dirceu Semighini Filho
Hello all. I have a script that create a dataframe from this operation: mytable <- sql(sqlContext,("SELECT ID_PRODUCT, ... FROM mytable")) rSparkDf <- createPartitionedDataFrame(sqlContext,myRdataframe) dFrame <- join(mytable,rSparkDf,mytable$ID_PRODUCT==rSparkDf$ID_PRODUCT) After filtering

Re: Client session timed out, have not heard from server in

2015-12-22 Thread Dirceu Semighini Filho
Hi Yash, I've experienced this behavior here when the process freeze in a worker. This mainly happen, in my case, when the worker memory was full and the java GC wasn't able to free memory for the process. Try to search for outofmemory error in your worker logs. Regards, Dirceu 2015-12-22 10:26

Re: How to set memory for SparkR with master="local[*]"

2015-10-23 Thread Dirceu Semighini Filho
Hi Matej, I'm also using this and I'm having the same behavior here, my driver has only 530mb which is the default value. Maybe this is a bug. 2015-10-23 9:43 GMT-02:00 Matej Holec : > Hello! > > How to adjust the memory settings properly for SparkR with > master="local[*]" >

Re:

2015-10-15 Thread Dirceu Semighini Filho
Hi Anfemee, Subject in the email sometimes help ;) Have you seen if the link is sending you to a hostname that is not accessible by your workstation? Sometimes changing the hostname to the ip solve this kind of issue. 2015-10-15 13:34 GMT-03:00 Anfernee Xu : > Sorry, I

Spark 1.5.1 ThriftServer

2015-10-15 Thread Dirceu Semighini Filho
Hello, I'm trying to migrate to scala 2.11 and I didn't found a spark-thriftserver jar for scala 2.11 in maven repository. I could a manual build (without tests) the spark with thriftserver in scala 2.11. Sometime ago the thrift server build wasn't enabled by default, but I can find a 2.10 jar for

Re: Null Value in DecimalType column of DataFrame

2015-09-18 Thread Dirceu Semighini Filho
for your use case. You > need a decimal type that has precision - scale >= 2. > > On Tue, Sep 15, 2015 at 6:39 AM, Dirceu Semighini Filho < > dirceu.semigh...@gmail.com> wrote: > >> >> Hi Yin, posted here because I think it's a bug. >> So, it will return nu