t in case of structured streaming it should be file location.
>> But main question in why do you want to checkpoint in
>> Nosql, as it's eventual consistence.
>>
>>
>> Regards
>> Amit
>>
>> On Sunday, September 27, 2020, Debabrata Ghosh
>> wrote:
Hi,
I had a query around Spark checkpoints - Can I store the checkpoints in
NoSQL or Kafka instead of Filesystem ?
Regards,
Debu
Hi,
I needed some help from you on the attached Spark problem please.
I am running the following query:
>>> df_location = spark.sql("""select dt from ql_raw_zone.ext_ql_location
where ( lat between 41.67 and 45.82) and (lon between -86.74 and -82.42
) and year=2020 and month=9 and
Hi All,
We have a Static DataFrame with as follows.
--
id|time_stamp|
--
|1|1540527851|
|2|1540525602|
|3|1530529187|
|4|1520529185|
|5|1510529182|
|6|1578945709|
--
We also have live stream of events, a Streaming DataFrame which contains id
and updated
Any solution please ?
On Fri, Apr 10, 2020 at 11:04 PM Debabrata Ghosh
wrote:
> Hi,
> I have a spark streaming application where Kafka is producing
> records but unfortunately spark streaming isn't able to consume those.
>
> I am hitting the following error:
>
> 20
On Fri, Apr 10, 2020 at 11:14 PM Srinivas V wrote:
> Check if your broker details are correct, verify if you have network
> connectivity to your client box and Kafka broker server host.
>
> On Fri, Apr 10, 2020 at 11:04 PM Debabrata Ghosh
> wrote:
>
>> Hi,
>>
Hi,
I have a spark streaming application where Kafka is producing
records but unfortunately spark streaming isn't able to consume those.
I am hitting the following error:
20/04/10 17:28:04 ERROR Executor: Exception in task 0.5 in stage 0.0 (TID 24)
java.lang.AssertionError: assertion
Greetings All !
I have got plenty of application directories lying around sparkStaging ,
such as .sparkStaging/application_1580703507814_0074
Would you please be able to help advise me which variable I need to set in
spark-env.sh so that the sparkStaging applications aren't preserved after
the
Greetings All !
I needed some help in obtaining the application logs but I am really
confused where it's currently located. Please allow me to explain my
problem:
1. I am running the Spark application (written in Java) in a Hortonworks
Data Platform Hadoop cluster
2. My spark-submit command is
Hi ,
I am using Hortonworks Data Platform 3.1. I am unable to
write data from Spark into a Hive Managed table but am able to do so in a
Hive External table.
Would you please help get me with a resolution.
Thanks,
Debu
Hi,
Please can you let me know which of the following options
would be a best practice for writing data into a Hive table :
Option 1:
outputDataFrame.write
.mode(SaveMode.Overwrite)
.format("csv")
.save("hdfs_path")
Option 2: Get the data from a dataframe and
Hi Everyone,
I have been trying to run spark-shell in YARN client mode, but am getting
lot of ClosedChannelException errors, however the program works fine on
local mode. I am using spark 2.2.0 build for Hadoop 2.7.3. If you are
familiar with this error, please can you help with the possible
/10/30/introducing-
> vectorized-udfs-for-pyspark.html
>
>
> Sent from my iPhone
> Pardon the dumb thumb typos :)
>
> On Mar 18, 2018, at 10:54 PM, Debabrata Ghosh <mailford...@gmail.com>
> wrote:
>
> Hi,
> My dataframe is having 2000 row
Hi,
My dataframe is having 2000 rows. For processing each row it
consider 3 seconds and so sequentially it takes 2000 * 3 = 6000 seconds ,
which is a very high time.
Further, I am contemplating to run the function in parallel.
For example, I would like to divide the
Hi All,
Greetings ! I needed some help to read a Hive table
via Pyspark for which the transactional property is set to 'True' (In other
words ACID property is enabled). Following is the entire stacktrace and the
description of the hive table. Would you please be able to help
Georg - Thanks ! Will you be able to help me with a few examples please.
Thanks in advance again !
Cheers,
D
On Mon, Feb 12, 2018 at 6:03 PM, Georg Heiler <georg.kf.hei...@gmail.com>
wrote:
> You should look into window functions for spark sql.
> Debabrata Ghosh <mailford...@gma
Hi,
Greetings !
I needed some efficient way in pyspark to execute a
comparison (on all the attributes) between the current row and the previous
row. My intent here is to leverage the distributed framework of Spark to
the best extent so that can achieve a good
Hi,
I am having a dataframe column (name of the column is CTOFF)
and I intend to prefix with '0' in case the length of the column is 3.
Unfortunately, I am unable to acheive my goal and wonder whether you can
help me here.
Command which I am executing:
ctoff_dedup_prep_temp =
;nicholas.hakobian@
> rallyhealth.com> wrote:
>
>> Using explode on the 4th column, followed by an explode on the 5th column
>> would produce what you want (you might need to use split on the columns
>> first if they are not already an array).
>>
>
Hi,
Greetings !
I am having data in the format of the following row:
ABZ|ABZ|AF|2,3,7,8,B,C,D,E,J,K,L,M,P,Q,T,U,X,Y|1,2,3,4,5|730
I want to convert it into several rows in the format below:
ABZ|ABZ|AF|2|1|730
ABZ|ABZ|AF|3+1|730
.
.
.
ABZ|ABZ|AF|3|1|730
ABZ|ABZ|AF|3|2|730
Hi All,
I am constantly hitting an error : "ApplicationMaster:
SparkContext did not initialize after waiting for 100 ms" while running my
Spark code in yarn cluster mode.
Here is the command what I am using :* spark-submit --master yarn
--deploy-mode cluster spark_code.py*
Ayan,
Did you get to work the HBase Connection through
Pyspark as well ? I have got the Spark - HBase connection working with
Scala (via HBasecontext). However, but I eventually want to get this
working within a Pyspark code - Would you have some suitable code snippets
or
Dear All,
Greetings ! I am repeatedly hitting a NullPointerException
error while saving a Scala Dataframe to HBase. Please can you help
resolving this for me. Here is the code snippet:
scala> def catalog = s"""{
||"table":{"namespace":"default", "name":"table1"},
Dear All,
Greetings !
I needed some best practices for integrating Spark
with HBase. Would you be able to point me to some useful resources / URL's
to your convenience please.
Thanks,
Debu
Hi,
While executing a SparkSQL query, I am hitting the
following error. Wonder, if you can please help me with a possible cause
and resolution. Here is the stacktrace for the same:
07/25/2017 02:41:58 PM - DataPrep.py 323 - __main__ - ERROR - An error
occurred while calling
Roman
>> [image: https://]about.me/alonso.isidoro.roman
>>
>> <https://about.me/alonso.isidoro.roman?promo=email_sig_source=email_sig_medium=email_sig_campaign=external_links>
>>
>> 2017-06-09 14:50 GMT+02:00 Debabrata Ghosh <mailford...@gmail.com>:
>>
>>> Hi,
>>> I need some help / guidance in performance tuning
>>> Spark code written in Scala. Can you please help.
>>>
>>> Thanks
>>>
>>
>>
>
Hi,
I need some help / guidance in performance tuning
Spark code written in Scala. Can you please help.
Thanks
27 matches
Mail list logo