Hi,
Could you provide with the code snippet of how you are connecting and
reading data from kafka?
Akshay Bhardwaj
+91-97111-33849
On Thu, Oct 17, 2019 at 8:39 PM Amit Sharma wrote:
> Please update me if any one knows about it.
>
>
> Thanks
> Amit
>
> On Thu, Oct 10,
? Standalone spark process(master is set
to local[*]) ?
Spark master-slave cluster?
YARN or Mesos Cluster, etc?
Akshay Bhardwaj
+91-97111-33849
On Mon, Oct 21, 2019 at 11:20 AM Manuel Sopena Ballesteros <
manuel...@garvan.org.au> wrote:
> Dear Apache Spark community,
>
>
>
was then irrespective of the
Cluster manager used.
Akshay Bhardwaj
+91-97111-33849
On Tue, Jun 11, 2019 at 7:41 PM Shyam P wrote:
> Hi,
> Any clue why spark job goes into UNDEFINED state ?
>
> More detail are in the url.
>
> https://stackoverflow.com/questions/56545644/why-my-spark-sql-
Additionally there is "uuid" function available as well if that helps your
use case.
Akshay Bhardwaj
+91-97111-33849
On Thu, Jun 6, 2019 at 3:18 PM Akshay Bhardwaj <
akshay.bhardwaj1...@gmail.com> wrote:
> Hi Marcelo,
>
> If you are using spark 2.3+ and dataset
functions.
Akshay Bhardwaj
+91-97111-33849
On Thu, May 30, 2019 at 4:05 AM Marcelo Valle
wrote:
> Hi all,
>
> I am new to spark and I am trying to write an application using dataframes
> that normalize data.
>
> So I have a dataframe `denormalized_cities` with 3 columns: CO
object stores before they can be
referenced in spark.
As you mention you are using Azure blob files, this should explain the
behaviour where everything seems to stop. You can reduce this time by
ensuring you have small number of large files in your blob store to read
from instead of vice-a-versa.
Aksha
Hi,
Add writeStream.option("quoteMode", "NONE")
By default Spark dataset api assumes that all the values MUST BE enclosed
in quote character (def: ") while writing to CSV files.
Akshay Bhardwaj
+91-97111-33849
On Tue, May 21, 2019 at 5:34 PM 杨浩 wrote:
> We us
Hi Hari,
Thanks for this information.
Do you have any resources on/can explain, why YARN has this as default
behaviour? What would be the advantages/scenarios to have multiple
assignments in single heartbeat?
Regards
Akshay Bhardwaj
+91-97111-33849
On Mon, May 20, 2019 at 1:29 PM Hariharan
Hi All,
Just floating this email again. Grateful for any suggestions.
Akshay Bhardwaj
+91-97111-33849
On Mon, May 20, 2019 at 12:25 AM Akshay Bhardwaj <
akshay.bhardwaj1...@gmail.com> wrote:
> Hi All,
>
> I am running Spark 2.3 on YARN using HDP 2.6
>
> I am running sp
YARN decide which nodes to launch containers?
I have around 12 YARN nodes running in the cluster, but still i see
repeated patterns of 3-4 containers launched on the same node for a
particular job.
What is the best way to start debugging this reason?
Akshay Bhardwaj
+91-97111-33849
can communicate with Name node service?
Akshay Bhardwaj
+91-97111-33849
On Thu, May 16, 2019 at 4:27 PM Rishi Shah wrote:
> on yarn
>
> On Thu, May 16, 2019 at 1:36 AM Akshay Bhardwaj <
> akshay.bhardwaj1...@gmail.com> wrote:
>
>> Hi Rishi,
>>
>> Are you
Hi Anton,
Do you have the option of storing the JAR file on HDFS, which can be
accessed via spark in your cluster?
Akshay Bhardwaj
+91-97111-33849
On Thu, May 16, 2019 at 12:04 AM Oleg Mazurov
wrote:
> You can see what Uber JVM does at
> https://github.com/uber-common/jvm-pr
Hi Rishi,
Are you running spark on YARN or spark's master-slave cluster?
Akshay Bhardwaj
+91-97111-33849
On Thu, May 16, 2019 at 7:15 AM Rishi Shah wrote:
> Any one please?
>
> On Tue, May 14, 2019 at 11:51 PM Rishi Shah
> wrote:
>
>> Hi All,
>>
>> At times
*Say if I have 2 documents in the partition, and based on a field I want to
index the document, and based on another field I want to update the
document with an inline script.*
*Is there a possibility to do this in the same writeStream for Elastic
Search in spark structured streaming?*
Akshay Bh
ress status displays a lot of metrics that shall be
your first diagnosis to identify issues.
The progress status with kafka stream displays the "startOffset" and
"endOffset" values per batch. This is listed topic-partition wise the start
to end offsets per trigger batch of stre
have
streaming interval of 500ms, reading data from Kafka topic with max batch
size as 1000.
Akshay Bhardwaj
+91-97111-33849
proved to be unreliable, as I
have encountered corrupted files which causes errors on job restarts.
Akshay Bhardwaj
+91-97111-33849
On Wed, May 1, 2019 at 3:20 PM Anastasios Zouzias wrote:
> Hi,
>
> Have you checked the docs, i.e.,
> https://spark.apache.org/docs/latest/structur
Hi All,
Floating this again. Any suggestions?
Akshay Bhardwaj
+91-97111-33849
On Tue, Apr 30, 2019 at 7:30 PM Akshay Bhardwaj <
akshay.bhardwaj1...@gmail.com> wrote:
> Hi Experts,
>
> I am using spark structured streaming to read message from Kafka, with a
> producer that w
if the checksum is not present in KV store.
- My doubts with this approach is how to ensure safe write to both the
2nd topic and to KV store for storing checksum, in the case of unwanted
failures. How does that guarantee exactly-once with restarts?
Any suggestions are highly appreciated.
Akshay
Hi Austin,
Are you using Spark Streaming or Structured Streaming?
For better understanding, could you also provide sample code/config params
for your spark-kafka connector for the said streaming job?
Akshay Bhardwaj
+91-97111-33849
On Mon, Apr 29, 2019 at 10:34 PM Austin Weaver wrote
Hi,
In your spark-submit command, try using the below config property and see
if this solves the problem.
--conf spark.sql.files.ignoreCorruptFiles=true
For me this worked to ignore reading empty/partially uploaded gzip files in
s3 bucket.
Akshay Bhardwaj
+91-97111-33849
On Thu, Mar 7, 2019
Hi Pankaj,
What version of Spark are you using?
If you are using 2.4+ then there is an inbuilt function "to_json" which
converts the columns of your dataset to JSON format.
https://spark.apache.org/docs/2.4.0/api/sql/#to_json
Akshay Bhardwaj
+91-97111-33849
On Wed, Mar 6, 2019 a
Also, what is the average kafka record message size in bytes?
Akshay Bhardwaj
+91-97111-33849
On Wed, Mar 6, 2019 at 1:26 PM Akshay Bhardwaj <
akshay.bhardwaj1...@gmail.com> wrote:
> Hi,
>
> To better debug the issue, please check the below co
is not set, then poll.ms is
default to spark.network.timeout)
-
-
Akshay Bhardwaj
+91-97111-33849
On Wed, Mar 6, 2019 at 8:39 AM JF Chen wrote:
> When my kafka executor reads data from kafka, sometimes it throws the
> error "java.lang.AssertionError: assertion failed: Failed to
as schema of tables/views
used.
If there is an issue with your SQL syntax then the method throws below
exception that you can catch
org.apache.spark.sql.catalyst.parser.ParseException
Hope this helps!
Akshay Bhardwaj
+91-97111-33849
On Fri, Mar 1, 2019 at 10:23 PM kant kodali wrote:
are accessible?
3) Have you checked the memory consumption of the executors/driver running
in the cluster?
Akshay Bhardwaj
+91-97111-33849
On Wed, Feb 27, 2019 at 8:27 PM lokeshkumar wrote:
> Hi All
>
> We are running Spark version 2.4.0 and we run few Spark streaming jobs
> listening on
Hi Gabor,
I guess you are looking at Kafka 2.1 but Guillermo mentioned initially that
they are working with Kafka 1.0
Akshay Bhardwaj
+91-97111-33849
On Wed, Feb 27, 2019 at 5:41 PM Gabor Somogyi
wrote:
> Where exactly? In Kafka broker configuration section here it's 10080:
>
Hi Gabor,
I am talking about offset.retention.minutes which is set default as 1440
(or 24 hours)
Akshay Bhardwaj
+91-97111-33849
On Wed, Feb 27, 2019 at 4:47 PM Gabor Somogyi
wrote:
> Hi Akshay,
>
> The feature what you've mentioned has a default value of 7 days...
>
> BR,
&
Hi Guillermo,
What was the interval in between restarting the spark job? As a feature in
Kafka, a broker deleted offsets for a consumer group after inactivity of 24
hours.
In such a case, the newly started spark streaming job will read offsets
from beginning for the same groupId.
Akshay Bhardwaj
t;,
"startOffset" : {
"kafka_events_topic" : {
"2" : 32822078,
"1" : 114248484,
"0" : 114242134
}
},
"endOffset" : {
"kafka_events_topic" : {
"2" : 32822496,
&
while also filtering events fetched
from CSV file?
I am also open to suggestions if there is a better way of filtering out the
prohibited events in structured streaming.
Thanks in advance.
Akshay Bhardwaj
+91-97111-33849
31 matches
Mail list logo