Re: Do we need to kill a spark job every time we change and deploy it?

2018-11-28 Thread Irving Duran
Are you referring to have spark picking up a new jar build? If so, you can probably script that on bash. Thank You, Irving Duran On Wed, Nov 28, 2018 at 12:44 PM Mina Aslani wrote: > Hi, > > I have a question for you. > Do we need to kill a spark job every time we chang

Re: spark-shell doesn't start

2018-06-19 Thread Irving Duran
You are trying to run "spark-shell" as a command which is not in your environment. You might want to do "./spark-shell" or try "sudo ln -s /path/to/spark-shell /usr/bin/spark-shell" and then do "spark-shell". Thank You, Irving Duran On Sun, Jun 17, 2018

Re: [Spark] Supporting python 3.5?

2018-06-19 Thread Irving Duran
Cool, thanks for the validation! Thank You, Irving Duran On Thu, May 24, 2018 at 8:20 PM Jeff Zhang wrote: > > It supports python 3.5, and IIRC, spark also support python 3.6 > > Irving Duran 于2018年5月10日周四 下午9:08写道: > >> Does spark now support python 3.5 or it is ju

Re: [announce] BeakerX supports Scala+Spark in Jupyter

2018-06-07 Thread Irving Duran
So would you recommend not to have Toree and BeakerX installed to avoid conflicts? Thank you, Irving Duran On 06/07/2018 07:55 PM, s...@draves.org wrote: > The %%spark magic comes with BeakerX's Scala kernel, not related to Toree. > > On Thu, Jun 7, 2018, 8:51 PM Stephen Boesch <

Re: If there is timestamp type data in DF, Spark 2.3 toPandas is much slower than spark 2.2.

2018-06-07 Thread Irving Duran
I haven't noticed or seen this behavior.  Have you noticed this with by testing the same dataset between versions? Thank you, Irving Duran On 06/06/2018 11:22 PM, 李斌松 wrote: > If there is timestamp type data in DF, Spark 2.3 toPandas is much > slower than spark 2.2. signature.asc Descr

Re: Apache Spark Installation error

2018-05-31 Thread Irving Duran
You probably want to recognize "spark-shell" as a command in your environment. Maybe try "sudo ln -s /path/to/spark-shell /usr/bin/spark-shell" Have you tried "./spark-shell" in the current path to see if it works? Thank You, Irving Duran On Thu, May 31, 2018 a

Re: GroupBy in Spark / Scala without Agg functions

2018-05-29 Thread Irving Duran
Unless you want to get a count, yes. Thank You, Irving Duran On Tue, May 29, 2018 at 1:44 PM Chetan Khatri wrote: > Georg, I just want to double check that someone wrote MSSQL Server script > where it's groupby all columns. What is alternate best way to do distinct > al

[Spark] Supporting python 3.5?

2018-05-10 Thread Irving Duran
Does spark now support python 3.5 or it is just 3.4.x? https://spark.apache.org/docs/latest/rdd-programming-guide.html Thank You, Irving Duran

Re: [pyspark] Read multiple files parallely into a single dataframe

2018-05-04 Thread Irving Duran
I could be wrong, but I think you can do a wild card. df = spark.read.format('csv').load('/path/to/file*.csv.gz') Thank You, Irving Duran On Fri, May 4, 2018 at 4:38 AM Shuporno Choudhury < shuporno.choudh...@gmail.com> wrote: > Hi, > > I want to read multiple files p

Re: ML Linear and Logistic Regression - Poor Performance

2018-05-02 Thread Irving Duran
May want to think about reducing the number of iterations. Right now you have it set at 500. Thank You, Irving Duran On Fri, Apr 27, 2018 at 7:15 PM Thodoris Zois <z...@ics.forth.gr> wrote: > I am in CentOS 7 and I use Spark 2.3.0. Below I have posted my code. > Logistic regres

Re: [Spark 2.x Core] .collect() size limit

2018-04-30 Thread Irving Duran
I don't think there is a magic number, so I would say that it will depend on how big your dataset is and the size of your worker(s). Thank You, Irving Duran On Sat, Apr 28, 2018 at 10:41 AM klrmowse <klrmo...@gmail.com> wrote: > i am currently trying to find a workaround for

Re: ML Linear and Logistic Regression - Poor Performance

2018-04-27 Thread Irving Duran
Are you reformatting the data correctly for logistic (meaning 0 & 1's) before modeling? What are OS and spark version you using? Thank You, Irving Duran On Fri, Apr 27, 2018 at 2:34 PM Thodoris Zois <z...@ics.forth.gr> wrote: > Hello, > > I am running an experime

Re: Can spark handle this scenario?

2018-02-16 Thread Irving Duran
Do you only want to use Scala? Because otherwise, I think with pyspark and pandas read table you should be able to accomplish what you want to accomplish. Thank you, Irving Duran On 02/16/2018 06:10 PM, Lian Jiang wrote: > Hi, > > I have a user case: > > I want to download S

Re: Do we always need to go through spark-submit?

2017-08-30 Thread Irving Duran
I don't know how this would work, but maybe your .jar calls spark-submit from within your jar if you were to compile the jar with the spark-submit class. Thank You, Irving Duran On Wed, Aug 30, 2017 at 10:57 AM, kant kodali <kanth...@gmail.com> wrote: > Hi All, > > I understa

Re: [Spark] Can Apache Spark be used with time series processing?

2017-08-30 Thread Irving Duran
I think it will work. Might want to explore spark streams. Thank You, Irving Duran On Wed, Aug 30, 2017 at 10:50 AM, <kanth...@gmail.com> wrote: > I don't see why not > > Sent from my iPhone > > > On Aug 24, 2017, at 1:52 PM, Alexandr Porunov < > ale

Re: Reading Excel (.xlsm) file through PySpark 2.1.1 with external JAR is causing fatal conversion of data type

2017-08-16 Thread Irving Duran
I think there is a difference between the actual value in the cell and what Excel formats that cell. You probably want to import that field as a string or not have it as a date format in Excel. Just a thought Thank You, Irving Duran On Wed, Aug 16, 2017 at 12:47 PM, Aakash Basu

Re: ALSModel.load not working on pyspark 2.1.0

2017-07-31 Thread Irving Duran
;org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. >>> model2 = model.load('/models/als.test') >>> model ALS_4324a1082d889dd1f0e4 >>> model2 ALS_4324

Re: Scala, Python or Java for Spark programming

2017-06-13 Thread Irving Duran
distributed applications behind >>>>>> Spark which may have unforeseen side effects if the users do not know >>>>>> this, >>>>>> ie if they have never been used to parallel programming. >>>>>> >>>>>> On 7. Jun 2017, at 17:20, Mich Talebzadeh <mich.talebza...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> I am a fan of Scala and functional programming hence I prefer Scala. >>>>>> >>>>>> I had a discussion with a hardcore Java programmer and a data >>>>>> scientist who prefers Python. >>>>>> >>>>>> Their view is that in a collaborative work using Scala programming it >>>>>> is almost impossible to understand someone else's Scala code. >>>>>> >>>>>> Hence I was wondering how much truth is there in this statement. >>>>>> Given that Spark uses Scala as its core development language, what is the >>>>>> general view on the use of Scala, Python or Java? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Dr Mich Talebzadeh >>>>>> >>>>>> >>>>>> >>>>>> LinkedIn * >>>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>> >>>>>> >>>>>> >>>>>> http://talebzadehmich.wordpress.com >>>>>> >>>>>> >>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility >>>>>> for any loss, damage or destruction of data or any other property which >>>>>> may >>>>>> arise from relying on this email's technical content is explicitly >>>>>> disclaimed. The author will in no case be liable for any monetary damages >>>>>> arising from such loss, damage or destruction. >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> -- Thank You, Irving Duran

Re: Adding header to an rdd before saving to text file

2017-06-06 Thread Irving Duran
and finally write it to a txt >> file. >> >> What's the best way to add the header from source file, into rdd and have >> it available as header into new file I.e, when I transform the rdd into >> textfile using saveAsTexFile("newfile") the header 1, header 2 shall be >> available. >> >> >> Thanks, >> Upendra >> > > -- Thank You, Irving Duran

Re: Edge Node in Spark

2017-06-06 Thread Irving Duran
Where in the documentation did you find "edge node"? Spark would call it worker or executor, but not "edge node". Her is some info about yarn logs -> https://spark.apache.org/docs/latest/running-on-yarn.html. Thank You, Irving Duran On Tue, Jun 6, 2017 at 11:48 A

Re: Edge Node in Spark

2017-06-06 Thread Irving Duran
Ashok, Are you working with straight spark or referring to GraphX? Thank You, Irving Duran On Mon, Jun 5, 2017 at 3:45 PM, Ashok Kumar <ashok34...@yahoo.com.invalid> wrote: > Hi, > > I am a bit confused between Edge node, Edge server and gateway node in > Spark. > >

Re: Issue upgrading to Spark 2.1.1 from 2.1.0

2017-05-07 Thread Irving Duran
I haven't noticed that on behavior on ALS. Thank you, Irving Duran On 05/07/2017 04:14 PM, mhornbech wrote: > Hi > > We have just tested the new Spark 2.1.1 release, and observe an issue where > the driver program hangs when making predictions using a random forest. The > issue

Re: Graph Analytics on HBase with HGraphDB and Spark GraphFrames

2017-04-02 Thread Irving Duran
Thanks for the share! Thank You, Irving Duran On Sun, Apr 2, 2017 at 7:19 PM, Felix Cheung <felixcheun...@hotmail.com> wrote: > Interesting! > > -- > *From:* Robert Yokota <rayok...@gmail.com> > *Sent:* Sunday, April 2, 2017 9:40:07 AM &g

Re: [SparkSQL] pre-check syntex before running spark job?

2017-02-21 Thread Irving Duran
You can also run it on REPL and test to see if you are getting the expected result. Thank You, Irving Duran On Tue, Feb 21, 2017 at 8:01 AM, Yong Zhang <java8...@hotmail.com> wrote: > You can always use explain method to validate your DF or SQL, before any > action

Re: Graphx Examples for ALS

2017-02-17 Thread Irving Duran
Not sure I follow your question. Do you want to use ALS or GraphX? Thank You, Irving Duran On Fri, Feb 17, 2017 at 7:07 AM, balaji9058 <kssb...@gmail.com> wrote: > Hi, > > Where can i find the the ALS recommendation algorithm for large data set? > > Please feel to share

Re: Is it better to Use Java or Python on Scala for Spark for using big data sets

2017-02-09 Thread Irving Duran
I would say Java, since it will be somewhat similar to Scala. Now, this assumes that you have some app already written in Scala. If you don't, then pick the language that you feel most comfortable with. Thank you, Irving Duran On Feb 9, 2017, at 11:59 PM, nancy henry <nancyhen

Re: Spark: Scala Shell Very Slow (Unresponsive)

2017-02-06 Thread Irving Duran
be your connection rather than spark. Thank You, Irving Duran On Thu, Feb 2, 2017 at 3:34 PM, jimitkr <ji...@softpath.net> wrote: > Friends, > > After i launch spark-shell, the default Scala shell appears but is > unresponsive. > > When i type any command on the shell, noth

Re: Shortest path performance in Graphx with Spark

2017-01-11 Thread Irving Duran
-memory 8G \ --driver-memory 8G Thank You, Irving Duran On Tue, Jan 10, 2017 at 12:20 PM, Gerard Casey <gerardhughca...@gmail.com> wrote: > Hello everyone, > > I am creating a graph from a `gz` compressed `json` file of `edge` and > `vertices` type. > > I have put the f

Re: parsing embedded json in spark

2016-12-22 Thread Irving Duran
Is it an option to parse that field prior of creating the dataframe? If so, that's what I would do. In terms of your master node only working, you have to share more about your structure, are you using spark standalone, yarn, or mesos? Thank You, Irving Duran On Thu, Dec 22, 2016 at 1:42 AM

Re: Spark Batch checkpoint

2016-12-15 Thread Irving Duran
Not sure what programming language you are using, but in python you can do " sc.setCheckpointDir('~/apps/spark-2.0.1-bin-hadoop2.7/checkpoint/')". This will store checkpoints on that directory that I called checkpoint. Thank You, Irving Duran On Thu, Dec 15, 2016 at 10:33 AM, Se

Re: [Spark log4j] Turning off log4j while scala program runs on spark-submit

2016-12-12 Thread Irving Duran
on spark-shell (not spark-submit) -> http://stackoverflow.com/questions/27781187/how-to- stop-messages-displaying-on-spark-console Thanks for your help in advance. Thank You, Irving Duran

[Spark log4j] Turning off log4j while scala program runs on spark-submit

2016-12-09 Thread Irving Duran
on spark-shell (not spark-submit) -> http://stackoverflow.com/questions/27781187/how-to-stop-messages-displaying-on-spark-console Thanks for your help in advance. Thank You, Irving Duran