Re: Application not showing in Spark History

2016-08-02 Thread Sun Rui
bin/spark-submit will set some env variable, like SPARK_HOME, that Spark later will use to locate the spark-defaults.conf from which default settings for Spark will be loaded. I would guess that some configuration option like spark.eventLog.enabled in the spark-defaults.conf is skipped by

Re: The equivalent for INSTR in Spark FP

2016-08-02 Thread Mich Talebzadeh
Thanks Jared for your kind words. I don't think I am anywhere near there yet :) In general I subtract one character before getting to "CD". That is the way debit from debit cards are marked in a Bank's statement. I get out of bound error if -->

Re: Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Mich Talebzadeh
I don't think it really works and it is vague. Is it rows, blocks, network? [image: Inline images 1] Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Testing --supervise flag

2016-08-02 Thread Noorul Islam Kamal Malmiyoda
Widening to dev@spark On Mon, Aug 1, 2016 at 4:21 PM, Noorul Islam K M wrote: > > Hi all, > > I was trying to test --supervise flag of spark-submit. > > The documentation [1] says that, the flag helps in restarting your > application automatically if it exited with non-zero

Re: Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Chanh Le
I tried and it works perfectly. Regards, Chanh > On Aug 2, 2016, at 3:33 PM, Mich Talebzadeh wrote: > > OK > > Try that > > Another tedious way is to create views in Hive based on tables and use limit > on those views. > > But try that parameter first if it does

Re: sql to spark scala rdd

2016-08-02 Thread Sri
Make sense thanks. Thanks Sri Sent from my iPhone > On 2 Aug 2016, at 03:27, Jacek Laskowski wrote: > > Congrats! > > Whenever I was doing foreach(println) in the past I'm .toDF.show these > days. Give it a shot and you'll experience the feeling yourself! :) > > Pozdrawiam,

Re: sql to spark scala rdd

2016-08-02 Thread Jacek Laskowski
Congrats! Whenever I was doing foreach(println) in the past I'm .toDF.show these days. Give it a shot and you'll experience the feeling yourself! :) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at

Re: Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Chanh Le
Hi Ayan, You mean common.max_count = 1000 Max number of SQL result to display to prevent the browser overload. This is common properties for all connections It already set default in Zeppelin but I think it doesn’t work with Hive. DOC:

Re: java.net.UnknownHostException

2016-08-02 Thread Yang Cao
actually, i just came into same problem. Whether you can share some code around the error, then I can figure it out whether I can help you. And the "s001.bigdata” is your name of name node? > On 2016年8月2日, at 17:22, pseudo oduesp wrote: > > someone can help me please

Re: [MLlib] Term Frequency in TF-IDF seems incorrect

2016-08-02 Thread Nick Pentreath
Note that both HashingTF and CountVectorizer are usually used for creating TF-IDF normalized vectors. The definition ( https://en.wikipedia.org/wiki/Tf%E2%80%93idf#Definition) of term frequency in TF-IDF is actually the "number of times the term occurs in the document". So it's perhaps a bit of a

Re: java.net.UnknownHostException

2016-08-02 Thread pseudo oduesp
someone can help me please 2016-08-01 11:51 GMT+02:00 pseudo oduesp : > hi > i get the following erreors when i try using pyspark 2.0 with ipython on > yarn > somone can help me please . > java.lang.IllegalArgumentException: java.net.UnknownHostException: >

Re: Does it has a way to config limit in query on STS by default?

2016-08-02 Thread ayan guha
Zeppelin already has a param for jdbc On 2 Aug 2016 19:50, "Mich Talebzadeh" wrote: > Ok I have already set up mine > > > hive.limit.optimize.fetch.max > 5 > > Maximum number of rows allowed for a smaller subset of data for > simple LIMIT,

Re: The equivalent for INSTR in Spark FP

2016-08-02 Thread Jacek Laskowski
Congrats! You made it. A serious Spark dev badge unlocked :) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Aug 2, 2016 at 9:58 AM, Mich Talebzadeh

issue with coalesce in Spark 2.0.0

2016-08-02 Thread ??????
Hi all. Hi all. I'm testing on Spark 2.0.0 and found an issue when using coalesce in my code. The procedure is simple doing a coalesce for a RDD[Stirng], and this happened: java.lang.NoSuchMethodError:

Re: Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Mich Talebzadeh
Ok I have already set up mine hive.limit.optimize.fetch.max 5 Maximum number of rows allowed for a smaller subset of data for simple LIMIT, if it is a fetch query. Insert queries are not restricted by this limit. I am surprised that yours was missing.

Are join/groupBy operations with wide Java Beans using Dataset API much slower than using RDD API?

2016-08-02 Thread dueckm
Hello, I built a prototype that uses join and groupBy operations via Spark RDD API. Recently I migrated it to the Dataset API. Now it runs much slower than with the original RDD implementation. Did I do something wrong here? Or is this the price I have to pay for the more convienient API? Is

Re: spark 2.0 readStream from a REST API

2016-08-02 Thread Jacek Laskowski
On Tue, Aug 2, 2016 at 10:59 AM, Ayoub Benali wrote: > Why writeStream is needed to consume the data ? You need to start your structured streaming (query) and there's no way to access it without DataStreamWriter => writeStream's a must.

Re: The equivalent for INSTR in Spark FP

2016-08-02 Thread Mich Talebzadeh
Apologies it should read Jacek. Confused with my friend's name Jared :( Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Chanh Le
I just added to spark thrift server as it starts a param —hiveconf hive.limit.optimize.fetch.max=1000 > On Aug 2, 2016, at 4:50 PM, Mich Talebzadeh wrote: > > Ok I have already set up mine > > > hive.limit.optimize.fetch.max > 5 > >

Extracting key word from a textual column

2016-08-02 Thread Mich Talebzadeh
Hi, Need some ideas. *Summary:* I am working on a tool to slice and dice the amount of money I have spent so far (meaning the whole data sample) on a given retailer so I have a better idea of where I am wasting the money *Approach* Downloaded my bank statements from a given account in csv

Are join/groupBy operations with wide Java Beans using Dataset API much slower than using RDD API? [*]

2016-08-02 Thread dueckm
Hello, I built a prototype that uses join and groupBy operations via Spark RDD API. Recently I migrated it to the Dataset API. Now it runs much slower than with the original RDD implementation. Did I do something wrong here? Or is this the price I have to pay for the more convienient API? Is

unsubscribe

2016-08-02 Thread doovsaid
unsubscribe ZhangYi (张逸) BigEye website: http://www.bigeyedata.com blog: http://zhangyi.farbox.com tel: 15023157626 - 原始邮件 - 发件人:"zhangjp" <592426...@qq.com> 收件人:"user" 主题:unsubscribe 日期:2016年08月02日 11点00分 unsubscribe

decribe function limit of columns

2016-08-02 Thread pseudo oduesp
Hi in spark 1.5.0 i used descibe function with more than 100 columns . someone can tell me if any limit exsiste now ? thanks

Re: In 2.0.0, is it possible to fetch a query from an external database (rather than grab the whole table)?

2016-08-02 Thread Jacek Laskowski
Hi, Don't think so. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Aug 2, 2016 at 10:25 PM, pgb wrote: > I'm interested in

Re: [2.0.0] mapPartitions on DataFrame unable to find encoder

2016-08-02 Thread Sun Rui
import org.apache.spark.sql.catalyst.encoders.RowEncoder implicit val encoder = RowEncoder(df.schema) df.mapPartitions(_.take(1)) > On Aug 3, 2016, at 04:55, Dragisa Krsmanovic wrote: > > I am trying to use mapPartitions on DataFrame. > > Example: > > import

FW: Stop Spark Streaming Jobs

2016-08-02 Thread Park Kyeong Hee
So sorry. Your name was Pradeep !! -Original Message- From: Park Kyeong Hee [mailto:kh1979.p...@samsung.com] Sent: Wednesday, August 03, 2016 11:24 AM To: 'Pradeep'; 'user@spark.apache.org' Subject: RE: Stop Spark Streaming Jobs Hi. Paradeep Did you mean, how to kill the job? If yes,

Re: Extracting key word from a textual column

2016-08-02 Thread ayan guha
I would stay away from transaction tables until they are fully baked. I do not see why you need to update vs keep inserting with timestamp and while joining derive latest value on the fly. But I guess it has became a religious question now :) and I am not unbiased. On 3 Aug 2016 08:51, "Mich

Re: Extracting key word from a textual column

2016-08-02 Thread Ted Yu
+1 > On Aug 2, 2016, at 2:29 PM, Jörn Franke wrote: > > If you need to use single inserts, updates, deletes, select why not use hbase > with Phoenix? I see it as complementary to the hive / warehouse offering > >> On 02 Aug 2016, at 22:34, Mich Talebzadeh

Spark 2.0 error: Wrong FS: file://spark-warehouse, expected: file:///

2016-08-02 Thread Utkarsh Sengar
Upgraded to spark2.0 and tried to load a model: LogisticRegressionModel model = LogisticRegressionModel.load(sc.sc(), "s3a://cruncher/c/models/lr/"); Getting this error: Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file://spark-warehouse, expected: file:/// Full

Re: Spark 2.0 error: Wrong FS: file://spark-warehouse, expected: file:///

2016-08-02 Thread Sean Owen
This is https://issues.apache.org/jira/browse/SPARK-15899 -- anyone seeing this please review the proposed change. I think it's stalled and needs an update. On Tue, Aug 2, 2016 at 4:47 PM, Utkarsh Sengar wrote: > Upgraded to spark2.0 and tried to load a model: >

Re: Spark 2.0 error: Wrong FS: file://spark-warehouse, expected: file:///

2016-08-02 Thread Utkarsh Sengar
I don't think its a related problem, although by setting "spark.sql.warehouse.dir"=/tmp in spark config fixed it. On Tue, Aug 2, 2016 at 5:02 PM, Utkarsh Sengar wrote: > Do we have a workaround for this problem? > Can I overwrite that using some config? > > On Tue, Aug 2,

Re: [2.0.0] mapPartitions on DataFrame unable to find encoder

2016-08-02 Thread Ted Yu
Using spark-shell of master branch: scala> case class Entry(id: Integer, name: String) defined class Entry scala> val df = Seq((1,"one"), (2, "two")).toDF("id", "name").as[Entry] 16/08/02 16:47:01 DEBUG package$ExpressionCanonicalizer: === Result of Batch CleanExpressions ===

Re: Spark 2.0 error: Wrong FS: file://spark-warehouse, expected: file:///

2016-08-02 Thread Utkarsh Sengar
Do we have a workaround for this problem? Can I overwrite that using some config? On Tue, Aug 2, 2016 at 4:48 PM, Sean Owen wrote: > This is https://issues.apache.org/jira/browse/SPARK-15899 -- anyone > seeing this please review the proposed change. I think it's stalled >

Re: [2.0.0] mapPartitions on DataFrame unable to find encoder

2016-08-02 Thread Dragisa Krsmanovic
You are converting DataFrame to Dataset[Entry]. DataFrame is Dataset[Row]. mapPertitions works fine with simple Dataset. Just not with DataFrame. On Tue, Aug 2, 2016 at 4:50 PM, Ted Yu wrote: > Using spark-shell of master branch: > > scala> case class Entry(id: Integer,

Stop Spark Streaming Jobs

2016-08-02 Thread Pradeep
Hi All, My streaming job reads data from Kafka. The job is triggered and pushed to background with nohup. What are the recommended ways to stop job either on yarn-client or cluster mode. Thanks, Pradeep - To unsubscribe

Re: Extracting key word from a textual column

2016-08-02 Thread Yong Zhang
Well, if you still want to use windows function for your logic, then you need to derive a new column out, like "catalog", and use it as part of grouping logic. Maybe you can use regex for deriving out this new column. The implementation needs to depend on your data in

Fwd: Saving input schema along with PipelineModel

2016-08-02 Thread Satya Narayan1
Hi All, Is there any way I can save Input schema along with ml PipelineModel object? This feature will be really helpful while loading the model and running transform, as user can get back the schema , prepare the dataset for model.transform and don't need to remember it. I see below jira talks

Saving input schema along with PipelineModel

2016-08-02 Thread Satyanarayan Patel
Hi All, Is there any way I can save Input schema along with ml PipelineModel object? This feature will be really helpful while loading the model and running transform, as user can get back the schema , prepare the dataset for model.transform and don't need to remember it. I see below jira talks

Re: Extracting key word from a textual column

2016-08-02 Thread Jörn Franke
I agree with you. > On 03 Aug 2016, at 01:20, ayan guha wrote: > > I would stay away from transaction tables until they are fully baked. I do > not see why you need to update vs keep inserting with timestamp and while > joining derive latest value on the fly. > > But I

Re: Extracting key word from a textual column

2016-08-02 Thread Jörn Franke
Phoenix will become another standard query interface of hbase. I do not agree that using hbase directly will lead to a faster performance. It always depends how you use it. While it is another component, it can make sense to use it. This has to be evaluated on a case by case basis. If you only

Spark on yarn, only 1 or 2 vcores getting allocated to the containers getting created.

2016-08-02 Thread satyajit vegesna
Hi All, I am trying to run a spark job using yarn, and i specify --executor-cores value as 20. But when i go check the "nodes of the cluster" page in http://hostname:8088/cluster/nodes then i see 4 containers getting created on each of the node in cluster. But can only see 1 vcore getting

Re: calling dataset.show on a custom object - displays toString() value as first column and blank for rest

2016-08-02 Thread Jacek Laskowski
On Sun, Jul 31, 2016 at 4:16 PM, Rohit Chaddha wrote: > I have a custom object called A and corresponding Dataset > > when I call datasetA.show() method i get the following How do you create datasetA? How does A look like? Jacek

RE: Stop Spark Streaming Jobs

2016-08-02 Thread Park Kyeong Hee
Hi. Paradeep Did you mean, how to kill the job? If yes, you should kill the driver and follow next. on yarn-client 1. find pid - "ps -es | grep " 2. kill it - "kill -9 " 3. check executors were down - "yarn application -list" on yarn-cluster 1. find driver's application ID - "yarn application

FW: [jupyter] newbie. apache spark python3 'Jupyter' data frame problem with auto completion and accessing documentation

2016-08-02 Thread Andy Davidson
FYI From: on behalf of Thomas Kluyver Reply-To: Date: Tuesday, August 2, 2016 at 3:26 AM To: Project Jupyter Subject: Re: [jupyter] newbie. apache spark python3 'Jupyter' data frame problem

Job can not terminated in Spark 2.0 on Yarn

2016-08-02 Thread Liangzhao Zeng
Hi, I migrate my code to Spark 2.0 from 1.6. It finish last stage (and result is correct) but get following errors then start over. Any idea on what happen? 16/08/02 16:59:33 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event

Re: spark 1.6.0 read s3 files error.

2016-08-02 Thread freedafeng
Solution: sc._jsc.hadoopConfiguration().set("fs.s3a.awsAccessKeyId", "...") sc._jsc.hadoopConfiguration().set("fs.s3a.awsSecretAccessKey", "...") Got this solution from a cloudera lady. Thanks Neerja. -- View this message in context:

Re: Extracting key word from a textual column

2016-08-02 Thread Sonal Goyal
Hi Mich, It seems like an entity resolution problem - looking at different representations of an entity - SAINSBURY in this case and matching them all together. How dirty is your data in the description - are there stop words like SACAT/SMKT etc you can strip off and get the base retailer entity

Re: Extracting key word from a textual column

2016-08-02 Thread Mich Talebzadeh
Thanks. I believe there is some catalog of companies that I can get and store it in a table and math the company name to transactiondesciption column. That catalog should have sectors in it. For example company XYZ is under Grocers etc which will make search and grouping much easier. I believe

Re: spark 1.6.0 read s3 files error.

2016-08-02 Thread freedafeng
Any one, please? I believe many of us are using spark 1.6 or higher with s3... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-6-0-read-s3-files-error-tp27417p27451.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: spark 1.6.0 read s3 files error.

2016-08-02 Thread Andy Davidson
Hi Freedafeng I have been reading and writing to s3 using spark-1.6.x with out any problems. Can you post a little code example and any error messages? Andy From: freedafeng Date: Tuesday, August 2, 2016 at 9:26 AM To: "user @spark" Subject:

Re: Spark GraphFrames

2016-08-02 Thread Denny Lee
Hi Divya, Here's a blog post concerning On-Time Flight Performance with GraphFrames: https://databricks.com/blog/2016/03/16/on-time-flight-performance-with-graphframes-for-apache-spark.html It also includes a Databricks notebook that has the code in it. HTH! Denny On Tue, Aug 2, 2016 at 1:16

Re: What are using Spark for

2016-08-02 Thread Karthik Ramakrishnan
We used Storm for ETL, now currently thinking Spark might be advantageous since some ML also is coming our way. - Karthik On Tue, Aug 2, 2016 at 1:10 PM, Rohit L wrote: > Does anyone use Spark for ETL? > > On Tue, Aug 2, 2016 at 1:24 PM, Sonal Goyal

Re: What are using Spark for

2016-08-02 Thread Rohit L
Does anyone use Spark for ETL? On Tue, Aug 2, 2016 at 1:24 PM, Sonal Goyal wrote: > Hi Rohit, > > You can check the powered by spark page for some real usage of Spark. > > https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark > > > On Tuesday, August 2, 2016,

Re: What are using Spark for

2016-08-02 Thread Deepak Sharma
Yes.I am using spark for ETL and I am sure there are lot of other companies who are using spark for ETL. Thanks Deepak On 2 Aug 2016 11:40 pm, "Rohit L" wrote: > Does anyone use Spark for ETL? > > On Tue, Aug 2, 2016 at 1:24 PM, Sonal Goyal wrote:

Re: Job can not terminated in Spark 2.0 on Yarn

2016-08-02 Thread Ted Yu
Which hadoop version are you using ? Can you show snippet of your code ? Thanks On Tue, Aug 2, 2016 at 10:06 AM, Liangzhao Zeng wrote: > Hi, > > > I migrate my code to Spark 2.0 from 1.6. It finish last stage (and result is > correct) but get following errors then

Re: Job can not terminated in Spark 2.0 on Yarn

2016-08-02 Thread dhruve ashar
Hi LZ, Getting those error messages in logs is normal behavior. When the job completes, it shuts down the SparkListenerBus as there's no need of relaying any spark events to the interested registered listeners. So trying to add events from executors which are yet to shutdown, logs the error

Re: Spark 2.0 History Server Storage

2016-08-02 Thread Andrei Ivanov
OK, answering myself - this is broken since 1.6.2 by SPARK-13845 On Tue, Aug 2, 2016 at 12:10 AM, Andrei Ivanov wrote: > Hi all, > > I've just tried upgrading Spark to 2.0 and so far it looks generally good. > > But there

Re: Spark 2.0 History Server Storage

2016-08-02 Thread Andrei Ivanov
1. SPARK-16859 submitted On Tue, Aug 2, 2016 at 9:07 PM, Andrei Ivanov wrote: > OK, answering myself - this is broken since 1.6.2 by SPARK-13845 > > > On Tue, Aug

Re: Job can not terminated in Spark 2.0 on Yarn

2016-08-02 Thread Liangzhao Zeng
It try to execute the job again, from the first stage. 发自我的 iPhone > 在 Aug 2, 2016,11:24 AM,dhruve ashar 写道: > > Hi LZ, > > Getting those error messages in logs is normal behavior. When the job > completes, it shuts down the SparkListenerBus as there's no need of

Re: What are using Spark for

2016-08-02 Thread Mich Talebzadeh
Hi, If I may say, if you spend sometime going through this mailing list in this forum and see the variety of topics that users are discussing, then you may get plenty of ideas about Spark application in real life.. HTH Dr Mich Talebzadeh LinkedIn *

Re: Job can not terminated in Spark 2.0 on Yarn

2016-08-02 Thread dhruve ashar
Can you provide additional logs. On Tue, Aug 2, 2016 at 2:13 PM, Liangzhao Zeng wrote: > It is 2.6 and code is very simple. I load data file from Hdfs to create > rdd then same some samples. > > > Thanks > > 发自我的 iPhone > > 在 Aug 2, 2016,11:01 AM,Ted Yu

Re: What are using Spark for

2016-08-02 Thread Daniel Siegmann
Yes, you can use Spark for ETL, as well as feature engineering, training, and scoring. ~Daniel Siegmann On Tue, Aug 2, 2016 at 3:29 PM, Mich Talebzadeh wrote: > Hi, > > If I may say, if you spend sometime going through this mailing list in > this forum and see the

saving data frame to optimize joins at a later time

2016-08-02 Thread Cesar
Hi all: I wonder if there is a way to save a table in order to optimize join at a later time. For example if I do something like: val df = anotherDF.repartition("id")//some data frame df.registerTempTable("tableAlias") hiveContext.sql( "INSERT INTO whse.someTable SELECT * FROM tableAlias

Re: Job can not terminated in Spark 2.0 on Yarn

2016-08-02 Thread Liangzhao Zeng
It is 2.6 and code is very simple. I load data file from Hdfs to create rdd then same some samples. Thanks 发自我的 iPhone > 在 Aug 2, 2016,11:01 AM,Ted Yu 写道: > > Which hadoop version are you using ? > > Can you show snippet of your code ? > > Thanks > >> On Tue, Aug 2,

In 2.0.0, is it possible to fetch a query from an external database (rather than grab the whole table)?

2016-08-02 Thread pgb
I'm interested in learning if it's possible to grab the results set from a query run on an external database as opposed to grabbing the full table and manipulating it later. The base code I'm working with is below (using Spark 2.0.0): ``` from pyspark.sql import SparkSession df = spark.read\

Re: Extracting key word from a textual column

2016-08-02 Thread Mich Talebzadeh
Hi, I decided to create a catalog table in Hive ORC and transactional. That table has two columns of value 1. transactiondescription === account_table.transactiondescription 2. hashtag String column created from a semi automated process of deriving it from

Re: decribe function limit of columns

2016-08-02 Thread janardhan shetty
If you are referring to limit the # of columns you can select the columns and describe. df.select("col1", "col2").describe().show() On Tue, Aug 2, 2016 at 6:39 AM, pseudo oduesp wrote: > Hi > in spark 1.5.0 i used descibe function with more than 100 columns . > someone

[2.0.0] mapPartitions on DataFrame unable to find encoder

2016-08-02 Thread Dragisa Krsmanovic
I am trying to use mapPartitions on DataFrame. Example: import spark.implicits._ val df: DataFrame = Seq((1,"one"), (2, "two")).toDF("id", "name") df.mapPartitions(_.take(1)) I am getting: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product

Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Chanh Le
Hi everyone, I setup STS and use Zeppelin to query data through JDBC connection. A problem we are facing is users usually forget to put limit in the query so it causes hang the cluster. SELECT * FROM tableA; Is there anyway to config the limit by default ? Regards, Chanh

Re: Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Mich Talebzadeh
This is a classic problem on any RDBMS Set the limit on the number of rows returned like maximum of 50K rows through JDBC What is your JDBC connection going to? Meaning which RDBMS if any? HTH Dr Mich Talebzadeh LinkedIn *

Re: Spark GraphFrames

2016-08-02 Thread Kazuaki Ishizaki
Hi, Kay wrote a procedure to use GraphFrames with Spark. https://gist.github.com/kayousterhout/7008a8ebf2babeedc7ce6f8723fd1bf4 Kazuaki Ishizaki From: Divya Gehlot To: "user @spark" Date: 2016/08/02 14:52 Subject:Spark

Re: Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Chanh Le
Hi Mich, I use Spark Thrift Server basically it acts like Hive. I see that there is property in Hive. > hive.limit.optimize.fetch.max > Default Value: 5 > Added In: Hive 0.8.0 > Maximum number of rows allowed for a smaller subset of data for simple LIMIT, > if it is a fetch query. Insert

Re: Spark GraphFrames

2016-08-02 Thread Kazuaki Ishizaki
Sorry Please ignore this mail. Sorry for misinterpretation of GraphFrame in Spark. I thought that Frame Graph for profiling tool. Kazuaki Ishizaki, From: Kazuaki Ishizaki/Japan/IBM@IBMJP To: Divya Gehlot Cc: "user @spark" Date:

Re: The equivalent for INSTR in Spark FP

2016-08-02 Thread Mich Talebzadeh
No thinking on my part!!! rs.select(mySubstr($"transactiondescription", lit(1), instr($"transactiondescription", "CD"))).show(2) +--+ |UDF(transactiondescription,1,instr(transactiondescription,CD))|

Re: What are using Spark for

2016-08-02 Thread Sonal Goyal
Hi Rohit, You can check the powered by spark page for some real usage of Spark. https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark On Tuesday, August 2, 2016, Rohit L wrote: > Hi Everyone, > > I want to know the real world uses cases for which

Re: Tuning level of Parallelism: Increase or decrease?

2016-08-02 Thread Sonal Goyal
Hi Jestin, Which of your actions is the bottleneck? Is it group by, count or the join? Or all of them? It may help to tune the most time consuming ask first. On Monday, August 1, 2016, Nikolay Zhebet wrote: > Yes, Spark always trying to deliver snippet of code to the data

Re: The equivalent for INSTR in Spark FP

2016-08-02 Thread Mich Talebzadeh
it should be lit(0) :) rs.select(mySubstr($"transactiondescription", lit(0), instr($"transactiondescription", "CD"))).show(1) +--+ |UDF(transactiondescription,0,instr(transactiondescription,CD))|

Re: Does it has a way to config limit in query on STS by default?

2016-08-02 Thread Mich Talebzadeh
OK Try that Another tedious way is to create views in Hive based on tables and use limit on those views. But try that parameter first if it does anything. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: spark 2.0 readStream from a REST API

2016-08-02 Thread Ayoub Benali
Hello, here is the code I am trying to run: https://gist.github.com/ayoub-benali/a96163c711b4fce1bdddf16b911475f2 Thanks, Ayoub. 2016-08-01 13:44 GMT+02:00 Jacek Laskowski : > On Mon, Aug 1, 2016 at 11:01 AM, Ayoub Benali > wrote: > > > the

Application not showing in Spark History

2016-08-02 Thread Rychnovsky, Dusan
Hi, I am trying to launch my Spark application from within my Java application via the SparkSubmit class, like this: List args = new ArrayList<>(); args.add("--verbose"); args.add("--deploy-mode=cluster"); args.add("--master=yarn"); ... SparkSubmit.main(args.toArray(new

Re: spark 2.0 readStream from a REST API

2016-08-02 Thread Ayoub Benali
Why writeStream is needed to consume the data ? When I tried it I got this exception: INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint > org.apache.spark.sql.AnalysisException: Complete output mode not supported > when there are no streaming aggregations on streaming

Re: Application not showing in Spark History

2016-08-02 Thread Noorul Islam Kamal Malmiyoda
Have you tried https://github.com/spark-jobserver/spark-jobserver On Tue, Aug 2, 2016 at 2:23 PM, Rychnovsky, Dusan wrote: > Hi, > > > I am trying to launch my Spark application from within my Java application > via the SparkSubmit class, like this: > > > > List

Re: Stop Spark Streaming Jobs

2016-08-02 Thread Pradeep
Thanks Park. I am doing the same. Was trying to understand if there are other ways. Thanks, Pradeep > On Aug 2, 2016, at 10:25 PM, Park Kyeong Hee wrote: > > So sorry. Your name was Pradeep !! > > -Original Message- > From: Park Kyeong Hee