Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread hemant singh
Hello Mich, I will be more than happy to contribute to this. Thanks, Hemant On Thu, Mar 26, 2020 at 7:11 PM Mich Talebzadeh wrote: > Hi all, > > Do you think we can create a global solution in the cloud using > volunteers like us and third party employees. What I have in mind is to > create

Re: RESTful Operations

2020-01-20 Thread hemant singh
Livy has both statement based(scala) as well as batch processing(code jar packaged). I think first statement based approach is what you might want to look at. Data has to residing in some source. Thanks, Hemant On Mon, 20 Jan 2020 at 2:04 PM, wrote: > Sorry didn't explain well. Livy seems to

Re: [pyspark2.4+] A lot of tasks failed, but job eventually completes

2020-01-05 Thread hemant singh
ory property. > > Also some more information on the job, I have about 4 window functions on > this dataset before it gets written out. > > Any other ideas? > > Thanks, > -Shraddha > > On Sun, Jan 5, 2020 at 11:06 PM hemant singh wrote: > >> You can try increas

Re: [pyspark2.4+] A lot of tasks failed, but job eventually completes

2020-01-05 Thread hemant singh
You can try increasing the executor memory, generally this error comes when there is not enough memory in individual executors. Job is getting completed may be because when tasks are re-scheduled it would be going through. Thanks. On Mon, 6 Jan 2020 at 5:47 AM, Rishi Shah wrote: > Hello All, >

Re: Spark onApplicationEnd run multiple times during the application failure

2019-11-21 Thread hemant singh
why retry is happened from whole > application level? > > To my understanding, the retry can be done in job level. > > Jacky > > > > > > *From:* hemant singh [mailto:hemant2...@gmail.com] > *Sent:* November 21, 19 3:12 AM > *To:* Jiang, Yi J (CWM-NR) > *Cc:* M

Re: Spark onApplicationEnd run multiple times during the application failure

2019-11-21 Thread hemant singh
Could it be because of re-try. Thanks On Thu, 21 Nov 2019 at 3:35 AM, Jiang, Yi J (CWM-NR) wrote: > Hello > > We are running into an issue. > > We have customized the SparkListener class, and added that to spark > context. But when the spark job is failed, we find the “onApplicationEnd” >

Re: Spark - configuration setting doesn't work

2019-10-27 Thread hemant singh
You should add the configurations while creating the session, I don’t think you can override it once the session is created. Few are though. Thanks, Hemant On Sun, 27 Oct 2019 at 11:02 AM, Chetan Khatri wrote: > Could someone please help me. > > On Thu, Oct 17, 2019 at 7:29 PM Chetan Khatri >

Re: Read text file row by row and apply conditions

2019-09-30 Thread hemant singh
You can use csv reader with delimiter as '|' to split data and create a dataframe on top of the file data. Second step, filter the dataframe on column value like indicator type=A,D etc and put it in tables. Saving to tables you can use dataframewriter(not sure what is you destination db type

Spark foreach retry

2019-05-09 Thread hemant singh
Hi, I want to know what happens if foreach fails for some record. Does foreach retry like any general task it retries 4 times. Say I am pushing some payload to an API if for some record it fails then will it get retried or it is bypassed and rest of the records are processed. Thanks, Hemant

Re: write files of a specific size

2019-05-05 Thread hemant singh
Based on size of the output data you can do the math of how many file you will need to produce 100MB files. Once you have number of files you can do coalesce or repartition depending on whether your job writes more or less output partitions. On Sun, 5 May 2019 at 2:21 PM, rajat kumar wrote: >

Re: Spark Kafka Batch Write guarantees

2019-04-01 Thread hemant singh
Wed, Mar 27, 2019 at 1:15 AM hemant singh wrote: > >> We are using spark batch to write Dataframe to Kafka topic. The spark >> write function with write.format(source = Kafka). >> Does spark provide similar guarantee like it provides with saving >> dataframe to disk; that p

Spark Kafka Batch Write guarantees

2019-03-27 Thread hemant singh
We are using spark batch to write Dataframe to Kafka topic. The spark write function with write.format(source = Kafka). Does spark provide similar guarantee like it provides with saving dataframe to disk; that partial data is not written to Kafka i.e. full dataframe is saved or if job fails no

Re: Spark DataFrame/DataSet Wide Transformations

2019-02-06 Thread hemant singh
Same concept applies to Dataframe as it is with RDD with respect to transformations. Both are distributed data set. Thanks On Thu, Feb 7, 2019 at 8:51 AM Faiz Chachiya wrote: > Hello Team, > > With RDDs it is pretty clear which operations would result in wide > transformations and there are

Re: Create all the combinations of a groupBy

2019-01-23 Thread hemant singh
Check roll up and cube functions in spark sql. On Wed, 23 Jan 2019 at 10:47 PM, Pierremalliard < pierre.de-malli...@capgemini.com> wrote: > Hi, > > I am trying to generate a dataframe of all combinations that have a same > key > using Pyspark. > > example: > > (a,1) > (a,2) > (a,3) > (b,1) >

Re: use spark cluster in java web service

2018-11-01 Thread hemant singh
Why do't you explore Livy. You can use the Rest API to submit the jobs - https://community.hortonworks.com/articles/151164/how-to-submit-spark-application-through-livy-rest.html On Thu, Nov 1, 2018 at 12:52 PM 崔苗(数据与人工智能产品开发部) <0049003...@znv.com> wrote: > Hi, > we want to use spark in our

Re: How to do efficient self join with Spark-SQL and Scala

2018-09-21 Thread hemant singh
You can use spark dataframe 'when' 'otherwise' clause to replace SQL case statement. This piece will be required to calculate before - 'select student_id from tbl_student where candidate_id = c.candidate_id and approval_id = 2 and academic_start_date is null' Take the count of above DF after

Re: Launch a pyspark Job From UI

2018-06-11 Thread hemant singh
You can explore Livy https://dzone.com/articles/quick-start-with-apache-livy On Mon, Jun 11, 2018 at 3:35 PM, srungarapu vamsi wrote: > Hi, > > I am looking for applications where we can trigger spark jobs from UI. > Are there any such applications available? > > I have checked Spark-jobserver

Re: Aggregation of Streaming UI Statistics for multiple jobs

2018-05-27 Thread hemant singh
You can explore the rest API - https://spark.apache.org/docs/2.0.2/monitoring.html#rest-api On Sun, May 27, 2018 at 10:18 AM, skmishra wrote: > Hi, > > I am working on a streaming use case where I need to run multiple spark > streaming applications at the same time

Re: Spark 2.3 Tree Error

2018-05-26 Thread hemant singh
Per the sql plan this is where it is failing - Attribute(s) with the same name appear in the operation: fnlwgt_bucketed. Please check if the right attribute(s) are used.; On Sat, May 26, 2018 at 6:16 PM, Aakash Basu wrote: > Hi, > > This query is based on one step

Re: pyspark execution

2018-04-17 Thread hemant singh
If it contains only SQL then you can use a function as below - import subprocess def run_sql(sql_file_path, your_db_name ,location): subprocess.call(["spark-sql","-S","--hivevar","",,"--hivevar","LOCATION",location,"-f",sql_file_path]) In you have other pieces like spark code and not only sql

Re: Access Table with Spark Dataframe

2018-03-20 Thread hemant singh
See if this helps - https://stackoverflow.com/questions/42852659/makiing-sql-request-on-columns-containing-dot enclosing column names in "`" On Tue, Mar 20, 2018 at 6:47 PM, SNEHASISH DUTTA wrote: > Hi, > > I am using Spark 2.2 , a table fetched from database contains