Re: Spark 3 pod template for the driver

2020-06-26 Thread Jorge Machado
Try to set spark.kubernetes.container.image > On 26. Jun 2020, at 14:58, Michel Sumbul wrote: > > Hi guys, > > I try to use Spark 3 on top of Kubernetes and to specify a pod template for > the driver. > > Here is my pod manifest or the driver and when I do a spark-submit with the > option:

Re: Using hadoop-cloud_2.12 jars

2020-06-22 Thread Jorge Machado
You can build it from source. Clone the spark git repo and run: ./build/mvn clean package -DskipTests -Phadoop-3.2 -Pkubernetes -Phadoop-cloud Regards > On 22. Jun 2020, at 11:00, Rahij Ramsharan wrote: > > Hello, > > I am trying to use the new S3 committers >

Re: Arrow RecordBatches/Pandas Dataframes to (Arrow enabled) Spark Dataframe conversion in streaming fashion

2020-05-25 Thread Jorge Machado
Hey, from what I know you can try to Union them df.union(df2) Not sure if this is what you need > On 25. May 2020, at 13:53, Tanveer Ahmad - EWI wrote: > > Hi all, > > I need some help regarding Arrow RecordBatches/Pandas Dataframes to (Arrow > enabled) Spark Dataframe conversions. > Here

Re: How to deal Schema Evolution with Dataset API

2020-05-09 Thread Jorge Machado
Ok, I found a way to solve it. Just pass the schema like this: val schema = Encoders.product[Person].schema spark.read.schema(schema).parquet(“input”)…. > On 9. May 2020, at 13:28, Jorge Machado wrote: > > Hello everyone, > > One question to the community. >

How to deal Schema Evolution with Dataset API

2020-05-09 Thread Jorge Machado
Hello everyone, One question to the community. Imagine I have this Case class Person(age: int) spark.read.parquet(“inputPath”).as[Person] After a few weeks of coding I change the class to: Case class Person(age: int, name: Option[String] = None) Then when I run

How to run spark on GPUs

2019-06-26 Thread Jorge Machado
Hi Guys, what is the current recommend way to use GPUs on spark ? Which scheduler should we use ? Mesos Or Kubernetes ? What are the approaches to follow until https://issues.apache.org/jira/browse/SPARK-24615 is in place. Thanks Jorge

Expecting 'type' to be present

2019-03-18 Thread Jorge Machado
Hi everyone, does anyone know what does it mean the error: Expecting 'type' to be present when using spark on mesos ? ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master mesos://host:5050/api --deploy-mode cluster --conf spark.master.rest.enabled=true

Spark on Mesos broken on 2.4 ?

2019-03-18 Thread Jorge Machado
Hello Everyone, I’m just trying out the spark-shell on mesos and I don’t get any executors. To debug it I started the vagrant box from aurora and try it out there and I can the same issue as I’m getting on my cluster. On Mesos the only active framework is the spark-shel, it is running 1.6.1

Re: [Spark SQL] error in performing dataset union with complex data type (struct, list)

2018-06-04 Thread Jorge Machado
Have you tryed to narrow down the problem so that we can be 100% sure that it lies on the array types ? Just exclude them for sake of testing. If we know 100% that it is on this array stuff try to explode that columns into simple types. Jorge Machado > On 4 Jun 2018, at 11:09, Pra

Re: [Spark SQL] error in performing dataset union with complex data type (struct, list)

2018-06-04 Thread Jorge Machado
Try the same union with a dataframe without the arrays types. Could be something strange there like ordering or so. Jorge Machado > On 4 Jun 2018, at 10:17, Pranav Agrawal wrote: > > schema is exactly the same, not sure why it is failing though. > > root > |-- b

Re: Spark and Accumulo Delegation tokens

2018-03-23 Thread Jorge Machado
.toMap } If you could give me a tipp there would be great. Thanks Jorge Machado > On 23 Mar 2018, at 07:38, Saisai Shao <sai.sai.s...@gmail.com> wrote: > > I think you can build your own Accumulo credential provider as similar to > HadoopDelegationTokenProvider out of Spark

Spark and Accumulo Delegation tokens

2018-03-23 Thread Jorge Machado
of HadoopDelegationTokenProvider for Accumulo be accepted ? Jorge Machado

Re: Is there a mutable dataframe spark structured streaming 2.3.0?

2018-03-22 Thread Jorge Machado
DataFrames are not mutable. Jorge Machado > On 22 Mar 2018, at 10:07, Aakash Basu <aakash.spark@gmail.com> wrote: > > Hey, > > I faced the same issue a couple of days back, kindly go through the mail > chain with "Multiple Kafka Spark Streaming Dataframe

Re: Spark Druid Ingestion

2018-03-22 Thread Jorge Machado
Seems to me permissions problems ! Can you check your user / folder permissions ? Jorge Machado > On 22 Mar 2018, at 08:21, nayan sharma <nayansharm...@gmail.com> wrote: > > Hi All, > As druid uses Hadoop MapReduce to ingest batch data but I am trying spark for

HadoopDelegationTokenProvider

2018-03-21 Thread Jorge Machado
Hey spark group, I want to create a Delegation Token Provider for Accumulo I have One Question: How can I get the token that I added to the credentials from the Executor side ? the SecurityManager class is private… Thanks Jorge Machado

Re: HBase connector does not read ZK configuration from Spark session

2018-02-22 Thread Jorge Machado
Can it be that you are missing the HBASE_HOME var ? Jorge Machado > On 23 Feb 2018, at 04:55, Dharmin Siddesh J <siddeshjdhar...@gmail.com> wrote: > > I am trying to write a Spark program that reads data from HBase and store it > in DataFrame. > > I am

Spark billing on shared Clusters

2017-08-23 Thread Jorge Machado
first Software that addresses exact this problem. Best Regards Jorge Machado www.jmachado.me <http://www.jmachado.me/> jo...@jmachado.me <mailto:jo...@jmachado.me>

Spark billing on shared Clusters

2017-08-20 Thread Jorge Machado
first Software that addresses exact this problem. Best Regards Jorge Machado www.jmachado.me <http://www.jmachado.me/> jo...@jmachado.me <mailto:jo...@jmachado.me>

Re: mysql and Spark jdbc

2017-01-12 Thread Jorge Machado
Nice it worked !! thx Jorge Machado www.jmachado.me > On 12 Jan 2017, at 17:46, Asher Krim <ak...@hubspot.com> wrote: > > Have you tried using an alias? You should be able to replace > ("dbtable”,"sometable") with ("dbtable”,"SELECT utc_

mysql and Spark jdbc

2017-01-12 Thread Jorge Machado
mestamp` FROM sometable WHERE utc_timestamp <= 1347369839 AND id >= 1451658240 AND id < 145955088 the problem is that utc_timestamp is a function on mysql and it gets executed. How can I force Spark to not remove the `` on the where clauses ? Jorge Machado www.jmachado.me <http://www.jmachado.me/> Jorge Machado www.jmachado.me

Re: Running Multiple Versions of Spark on the same cluster (YARN)

2016-12-17 Thread Jorge Machado
Hi Tiago, thx for the update. Lat question : but this spark-submit that you are using need to be on the same version on all yarn hosts ? Regards Jorge Machado > On 17 Dec 2016, at 16:46, Tiago Albineli Motta <timo...@gmail.com> wrote: > > Hi Jorge, > > Here w

Running Multiple Versions of Spark on the same cluster (YARN)

2016-12-16 Thread Jorge Machado
. Thx Jorge Machado

Spatial Spark Library on 1.6

2016-04-01 Thread Jorge Machado
Hi Guys, does someone knows a good library for Geo spatial operations ? Magellan, Spatial Spark are broken on do not work properly on 1.6 Regards Jorge Machado www.jmachado.me

Re: Does SparkSql has official jdbc/odbc driver?

2016-03-29 Thread Jorge Machado
ming agents such as Flume or Storm, each batch of transactions) that alters a table or partition. At read time the reader merges the base and delta files, applying any updates and deletes as it reads. " Jorge Machado www.jmachado.me <http://www.jmachado.me/> > On 29/03/2016, at

Re: Does SparkSql has official jdbc/odbc driver?

2016-03-29 Thread Jorge Machado
ming agents such as Flume or Storm, each batch of transactions) that alters a table or partition. At read time the reader merges the base and delta files, applying any updates and deletes as it reads. " Jorge Machado www.jmachado.me > On 29/03/2016, at 10:27, Sage Meng <lk

Re: Does SparkSql has official jdbc/odbc driver?

2016-03-29 Thread Jorge Machado
Hi, you should know that “Spark” is not a relation database. So updates on data as you are used to in RDMS are not possible. Jorge Machado www.jmachado.me > On 29/03/2016, at 10:21, Sage Meng <lkke...@gmail.com> wrote: > > thanks, I found that I can use hive's jdbc dr

Re: value from groubBy paired rdd

2016-02-23 Thread Jorge Machado
out header pairs = data.map(lambda x: (x.split(",")[0], x[1]))#<— only pass the status grouped=pairs.groupByKey() <—— x = (user_id, list of status for that user)) print grouped is this what you want ? Jorge Machado www.jmachado.me > On 23/02/2016, at 2

Re: reasonable number of executors

2016-02-23 Thread Jorge Machado
Job> Basically it depends of your type of workload. Will you need Cache ? Jorge Machado www.jmachado.me > On 23/02/2016, at 15:49, Alex Dzhagriev <dzh...@gmail.com> wrote: > > Hello all, > > Can someone please advise me on the pros and cons on how to allocate the &

Re: Newbie questions regarding log processing

2016-02-22 Thread Jorge Machado
To Get the that you could use Flume to ship the logs from the Servers to the HDFS for example and to streaming on it. Check this : http://spark.apache.org/docs/latest/streaming-flume-integration.html and

Re: Option[Long] parameter in case class parsed from JSON DataFrame failing when key not present in JSON

2016-02-22 Thread Jorge Machado
Hi Anthony, I try the code on my self. I think it is on the jsonStr: I do it with : val jsonStr = """{"customer_id": "3ee066ab571e03dd5f3c443a6c34417a","product_id": 3}”"" or is it the “,” after your 3 oder the “\n” Regards > On 22/02/2016, at 15:42, Anthony Brew

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Jorge Machado
Hi Gourav, I did not unterstand your problem… the - - packages command should not make any difference if you are running standalone or in YARN for example. Give us an example what packages are you trying to load, and what error are you getting… If you want to use the libraries in

Re: How to add kafka streaming jars when initialising the sparkcontext in python

2016-02-15 Thread Jorge Machado
Hi David, Just package with maven and deploy everthing into one jar. You don´t need to do it like this… Use Maven for example. And check if your cluster already has this libraries loaded. If you are using CDH for example you can just import the classes because they already are in the path

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Jorge Machado
$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.3.0 It will download everything for you and register into your JVM. If you want to use it in your Prod just package it with maven. > On 15/02/2016, at 12:14, Gourav Sengupta wrote: > > Hi, >

Re: Spark : Unable to connect to Oracle

2016-02-10 Thread Jorge Machado
Hi Divya, You need to install the Oracle jdbc driver on the cluster into lib folder. > On 10/02/2016, at 09:37, Divya Gehlot wrote: > > oracle.jdbc.driver.OracleDrive

Re: [MLlib] What is the best way to forecast the next month page visit?

2016-02-01 Thread Jorge Machado
Hi Guru, So First transform your Name pages with OneHotEncoder ( https://spark.apache.org/docs/latest/ml-features.html#onehotencoder ) then make the same thing for months: You will end with something like: (first

Re: Long running jobs in CDH

2016-01-13 Thread Jorge Machado
Hi Jan, Oozie oder you can check the parameter —supervise option http://spark.apache.org/docs/latest/submitting-applications.html > On 11/01/2016, at 14:23, Jan Holmberg wrote: > > Hi, > any

Best IDE Configuration

2016-01-09 Thread Jorge Machado
Hello everyone, I´m just wondering how do you guys develop for spark. For example I cannot find any decent documentation for connecting Spark to Eclipse using maven or sbt. Is there any link around ? Jorge thanks - To

Re: Recommendations using Spark

2016-01-08 Thread Jorge Machado
Hello anjali, You can Start here : org.apache.spark.mllib.recommendation Them you should build a “recomender” you need to transform your trainData into Rating objects them you can train a model with for example : val model = ALS.trainImplicit(trainData, 10, 5, 0.01, 1.0) Jorge > On

Date Time Regression as Feature

2016-01-07 Thread Jorge Machado
, [2015,12,10,10,10] )? I could not fine any example with value prediction where features had dates in it. Thanks Jorge Machado Jorge Machado jo...@jmachado.me - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

Date and Time as a Feature

2016-01-06 Thread Jorge Machado
] )? I could not fine any example with value prediction where features had dates in it. Thanks Jorge Machado jo...@jmachado.me - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h