Hi ,
I am using spark-sql-2.4.1v with kafka
I am facing slow consumer issue
I see warning "KafkaConsumer cache hitting max capacity of 64, removing
consumer for
CacheKey(spark-kafka-source-33321dde-bfad-49f3-bdf7-09f95883b6e9--1249540122-executor)"
in logs
more on the same
Hi,
I have scenario like below
https://stackoverflow.com/questions/58134379/how-to-handle-backup-scenario-in-spark-structured-streaming-using-joins
How to handle this use-case ( back-up scenario) in
spark-structured-streaming?
Any clues would be highly appreciated.
Thanks,
Shyam
Hi ,
Though my spark-job working fine in my local in spark cluster it has
issue .
Can anyone suggest me what is wrong here ?
https://stackoverflow.com/questions/57960569/accessing-external-yml-file-in-my-spark-job-code-not-working-throwing-cant-con
Regards,
Shyam
Difficult things in spark is debugging and tuning.
cool ,but did you find a way or anyhelp or clue ?
On Fri, Sep 6, 2019 at 11:40 PM David Zhou wrote:
> I have the same question with yours
>
> On Thu, Sep 5, 2019 at 9:18 PM Shyam P wrote:
>
>> Hi,
>>
>> I am using spark-sql-2.4.1v to streaming in my PoC.
&
Hi,
I am using spark-sql-2.4.1v to streaming in my PoC.
how to refresh the loaded dataframe from hdfs/cassandra table every time
new batch of stream processed ? What is the practice followed in general to
handle this kind of scenario?
Below is the SOF link for more details .
Now I am getting different error as below :
com.datastax.spark.connector.types.TypeConversionException: Cannot convert
object [] of type class
org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema to
com.datastax.driver.core.LocalDate.
at
J Franke,
Leave alone sqoop , I am just asking about spark in ETL of Oracle ...?
Thanks,
Shyam
>
Hi Mich,
Lot of people say that Spark does not have proven record in migrating
data from oracle as sqoop has.
At list in production.
Please correct me if I am wrong and suggest how to deal with shuffling when
dealing with groupBy ?
Thanks,
Shyam
On Sat, Aug 31, 2019 at 12:17 PM Mich
ndra connector library written for spark
> streaming because we wrote one ourselves when we wanted to do the same.
>
> Regards
> Prathmesh Ranaut
> https://linkedin.com/in/prathmeshranaut
>
> On Aug 29, 2019, at 7:21 AM, Shyam P wrote:
>
> Hi,
>
> I need to
Hi,
I need to do a PoC for a business use-case.
*Use case :* Need to update a record in Cassandra table if exists.
Will spark streaming support compare each record and update existing
Cassandra record ?
For each record received from kakfa topic , If I want to check and compare
each record
>
> updated the issue content.
>
https://stackoverflow.com/questions/57684972/how-to-improve-performance-my-spark-job-here-to-load-data-into-cassandra-table
Thank you.
Hi,
Is groupBy and partition are similar in this scenario?
I know they are not similar and mean for different purpose but I am
confused here.
Still I need to do partitioning here to save into Cassandra ?
Below is my scenario.
I am using spark-sql-2.4.1 ,spark-cassandra-connector_2.11-2.4.1 with
Hi,
Any advice how to do this in spark sql ?
I have a scenario as below
dataframe1 = loaded from an HDFS Parquet file.
dataframe2 = read from a Kafka Stream.
If column1 of dataframe1 value in columnX value of dataframe2 , then I need
then I need to replace column1 value of dataframe1.
Hi,
Anyhelp is thankful.
https://stackoverflow.com/questions/56991447/in-spark-dataset-s-can-be-passed-as-input-args-to-a-function-to-get-out-put-args
Regards,
Shyam
reference/html/springandhadoop-spark.html
>
>
> On Mon, Jun 17, 2019 at 12:27 PM Shyam P wrote:
>
>> I am developing a spark job using java1.8v.
>>
>> Is it possible to write a spark app using spring-boot technology?
>> Did anyone tried it ? if so how
I am developing a spark job using java1.8v.
Is it possible to write a spark app using spring-boot technology?
Did anyone tried it ? if so how it should be done?
Regards,
Shyam
Hi,
Any clue why spark job goes into UNDEFINED state ?
More detail are in the url.
https://stackoverflow.com/questions/56545644/why-my-spark-sql-job-stays-in-state-runningfinalstatus-undefined
Appreciate your help.
Regards,
Shyam
https://stackoverflow.com/questions/56428367/any-clue-how-to-join-this-spark-structured-stream-joins
Hi,
Any suggestions regarding below issue?
https://stackoverflow.com/questions/56524921/how-spark-structured-streaming-consumers-initiated-and-invoked-while-reading-mul
Thanks,
Shyam
Hi Deepak,
Why are you getting paths from kafka topic? any specific reason to do so ?
Regards,
Shyam
On Mon, Jun 10, 2019 at 10:44 AM Deepak Sharma
wrote:
> The context is different here.
> The file path are coming as messages in kafka topic.
> Spark streaming (structured) consumes form this
https://stackoverflow.com/questions/56524539/how-to-handle-small-file-problem-in-spark-structured-streaming
Regards,
Shyam
Thank you so much Alex Ott.
On Fri, May 31, 2019 at 6:05 PM Alex Ott wrote:
> Check the answer on SO...
>
> On Fri, May 31, 2019 at 1:04 PM Shyam P wrote:
>
>> Trying to save a sample data into C* table
>>
>> I am getting below error :
>>
>> *java.util
Trying to save a sample data into C* table
I am getting below error :
*java.util.NoSuchElementException: Columns not found in table
abc.company_vals: companyId, companyName*
Though I have all the columns and re checked them again and again.
I dont see any issue with columns.
I am using
Hi,
https://stackoverflow.com/questions/56181135/design-can-kafka-producer-written-as-spark-job
Thank you,
Shyam
Hi ,
I have oracle table in which has
column schema is : DATA_DATE DATE something like 31-MAR-02
I am trying to retrieve data from oracle using spark-sql-2.4.1 version. I
tried to set the JdbcOptions as below :
.option("lowerBound", "2002-03-31 00:00:00");
.option("upperBound",
Asmath,
Why upperBound is set to 300 ? how many cores you have ?
check how data is distributed in TeraData DB table.
SELECT distinct( itm_bloon_seq_no ), count(*) as cc FROM TABLE order
by itm_bloon_seq_no desc;
Is this column "itm_bloon_seq_no" already in table or you derived at spark
Junfeng Chen
>
>
> On Thu, Mar 14, 2019 at 2:26 PM Shyam P wrote:
>
>> cool.
>>
>> On Tue, Mar 12, 2019 at 9:08 AM JF Chen wrote:
>>
>>> Hi
>>> Finally I found the reason...
>>> It caused by some long time gc on some datanodes. Afte
https://stackoverflow.com/questions/55823608/how-to-handle-spark-stddev-function-output-value-when-there-there-is-no-data
Regards,
Shyam
e liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 5 Apr 2019 at 10:51, Shyam P wrote:
>
>> Hi ,
>> In my scenario I have few companies , for which I need to calculate few
>> stats like avg I need to be stored in Cassandra
Hi ,
In my scenario I have few companies , for which I need to calculate few
stats like avg I need to be stored in Cassandra , for next set of records I
need to get previously calculated and over it i need to calculate
accumulated results ( i.e preset set of data + previously stored stats) and
kes a long time.
> Now I have decommissioned the broken data nodes, and now my spark runs
> well.
> I am trying to increase the heap size of data node to check if it can
> resolve the problem
>
> Regard,
> Junfeng Chen
>
>
> On Fri, Mar 8, 2019 at 8:54 PM Shyam P wrote
e spark UI I can ensure data is not skewed. There is only about
>> 100MB for each task, where most of tasks takes several seconds to write the
>> data to hdfs, and some tasks takes minutes of time.
>>
>> Regard,
>> Junfeng Chen
>>
>>
>> On Wed, Mar
Would be better if you share some code block to understand it better.
Else would be difficult to provide answer.
~Shyam
On Wed, Mar 6, 2019 at 8:38 AM JF Chen wrote:
> When my kafka executor reads data from kafka, sometimes it throws the
> error "java.lang.AssertionError: assertion failed:
ngs, some tasks in write hdfs stage cost
> much more time than others, where the amount of writing data is similar.
> How to solve it?
>
> Regard,
> Junfeng Chen
>
>
> On Tue, Mar 5, 2019 at 3:05 PM Shyam P wrote:
>
>> Hi JF ,
>>
Thanks a lot Roman.
But provided link as several ways to deal the problem.
Why do we need to do operation on RDD instead dataframe/dataset ?
Do I need a custom partitioner in my case , how to invoke it in spark-sql?
Can anyone provide some sample on handling skewed data with spark-sql?
Thanks,
Hi All,
I need to save a huge data frame as parquet file. As it is huge its
taking several hours. To improve performance it is known I have to send it
group wise.
But when I do partition ( columns*) /groupBy(Columns*) , driver is spilling
a lot of data and performance hits a lot again.
So how
Something wrong with query. Add the code snippet to exactly what are you
trying to do.
~Shyam
On Fri, Mar 1, 2019 at 1:07 PM yuvraj singh <19yuvrajsing...@gmail.com>
wrote:
> Hi,
>
> I am running spark as a service , when we change some sql schema we are
> facing some problems .
>
> ERROR
Hi JF ,
Try to execute it before df.write
//count by partition_id
import org.apache.spark.sql.functions.spark_partition_id
df.groupBy(spark_partition_id).count.show()
You will come to know how data has been partitioned inside df.
Small trick we can apply here while
What IRC channel we should join?
On Tue, 19 Feb 2019, 17:56 Robert Kaye, wrote:
> Hello!
>
> I’m Robert Kaye from the MetaBrainz Foundation — we’re the people behind
> MusicBrainz ( https://musicbrainz.org ) and more recently ListenBrainz (
> https://listenbrainz.org ). ListenBrainz is aiming
Hi,
I have scenario where I need to ingest data into master table which has
many number of columns include few columns like "Country_Id"
"CountryName","Date"etc.
Everytime I load data with new records this "Date" would change to the data
generation data. Every time each country data might
41 matches
Mail list logo