Hi Team ,
We are not getting any error when retrieving the data from hive table in
PYSPARK , but getting the error ( Scala.matcherror MATERIALIZED_VIEW ( of
class org.Apache.Hadoop.hive.metastore.TableType ) . Please let me know
resolution for this ?
Thanks
Monotonically_increasing_id() will give the same functionality
On Mon, 7 Feb, 2022, 6:57 am , wrote:
> For a dataframe object, how to add a column who is auto_increment like
> mysql's behavior?
>
> Thank you.
>
> -
> To
(dfKafkaPayload.select("value").as[String]).schema
But while executing the same via Spark Streaming Job, we cannot do the
above since streaming can have only on Action.
Please let me know.
Thanks
Siva
Hi Team,
I have a spark streaming job which I am running in a single node
cluster. I often see the schedulingTime > Processing Time in streaming
statistics after a few minutes of my application startup. What does that
mean? Should I increase the no:of receivers?
Regards
Taun
Hi Jainshasha,
I need to read each row from Dataframe and made some changes to it before
inserting it into ES.
Thanks
Siva
On Mon, Oct 5, 2020 at 8:06 PM jainshasha wrote:
> Hi Siva
>
> To emit data into ES using spark structured streaming job you need to used
> ElasticSearch j
Hi Team,
I have a spark streaming job, which will read from kafka and write into
elastic via Http request.
I want to validate each request from Kafka and change the payload as per
business need and write into Elastic Search.
I have used ES Http Request to push the data into Elastic Search. Can
Hi all,
I am using Spark Structured Streaming (Version 2.3.2). I need to read from
Kafka Cluster and write into Kerberized Kafka.
Here I want to use Kafka as offset checkpointing after the record is
written into Kerberized Kafka.
Questions:
1. Can we use Kafka for checkpointing to manage offset
Yes, I am also facing the same issue. Did you figured out?
On Tue, 9 Jul 2019, 7:25 pm Kamalanathan Venkatesan, <
kamalanatha...@in.ey.com> wrote:
> Hello,
>
>
>
> I have below spark structural streaming code and I was expecting the
> results to be printed on the console every 10 seconds. But, I
Hi Team,
Need help on windowing & watermark concept. This code is not working as
expected.
package com.jiomoney.streaming
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.streaming.ProcessingTime
object SlingStreaming {
def
ect statement. If I'm not mistaken, it is known
> as a bit costly since each call would produce a new Dataset. Defining
> schema and using "from_json" will eliminate all the call of withColumn"s"
> and extra calls of "get_json_object".
>
> - Jungtaek
Hello All,
I am using Spark 2.3 version and i am trying to write Spark Streaming Join.
It is a basic join and it is taking more time to join the stream data. I am
not sure any configuration we need to set on Spark.
Code:
*
import org.apache.spark.sql.SparkSession
import
Hi,
When I am doing calculations for example 700 listID's it is saving only some 50
rows and then getting some random exceptions
Getting below exception when I try to do calculations on huge data and try to
save huge data . Please let me know if any suggestions.
Sample Code :
I have some
Hi
Am getting below exception when I Run Spark-submit in linux machine , can
someone give quick solution with commands
Driver stacktrace:
- Job 0 failed: count at DailyGainersAndLosersPublisher.scala:145, took
5.749450 s
org.apache.spark.SparkException: Job aborted due to stage failure: Task 4
You can try with this, it will work
val finaldf = merchantdf.write.
format("org.apache.spark.sql.cassandra")
.mode(SaveMode.Overwrite)
.option("confirm.truncate", true)
.options(Map("table" -> "tablename", "keyspace" -> "keyspace"))
.save()
On Wed 27 Jun,
hello, Can I do complex data manipulations inside groupby function.? i.e. I
want to group my whole dataframe by a column and then do some processing for
each group.
The information contained in this message is intended only for the recipient,
and may be a
Hello Asmath,
We had a similar challenge recently.
When you write back to hive, you are creating files on HDFS, and it depends on
your batch window.
If you increase your batch window lets say from 1 min to 5 mins you will end up
creating 5x times less.
The other factor is your partitioning.
t it reads
> the file, but it should not read all the content, which is probably also not
> happening.
>
> On 24. Oct 2017, at 18:16, Siva Gudavalli <gudavalli.s...@yahoo.com.INVALID
> <mailto:gudavalli.s...@yahoo.com.INVALID>> wrote:
>
>>
>> Hello,
>>
&
92 DESC], output=[id#192])
+- ConvertToSafe
+- Project [id#192]
+- Filter (usr#199 = AA0YP)
+- HiveTableScan [id#192,usr#199], MetastoreRelation default, hlogsv5,
None, [(cdt#189 = 20171003),(usrpartkey#191 = hhhUsers)]
please let me know if i am missing anything here. thank you
On Monday,
Hello,
I am working with Spark SQL to query Hive Managed Table (in Orc Format)
I have my data organized by partitions and asked to set indexes for each
50,000 Rows by setting ('orc.row.index.stride'='5')
lets say -> after evaluating partition there are around 50 files in which
data is
Hello,
I have my data stored in parquet file format. My data Is already partitioned by
dates and keyNow I want my data in each file to be sorted by a new Code column.
date1 -> key1
-> paqfile1
->paqfile2
->key2
->paqfile1
->paqfile2
date2
operation using only one task. I couldn't increase the
parallelism.
Thanks in advance
Thanks
Siva
Use Spark XML version,0.3.3
com.databricks
spark-xml_2.10
0.3.3
On Fri, Jun 17, 2016 at 4:25 PM, VG <vlin...@gmail.com> wrote:
> Hi Siva
>
> This is what i have for jars. Did you manage to run with these or
> different versions ?
>
>
>
> org.apache.s
Hi Marco,
I did run in IDE(Intellij) as well. It works fine.
VG, make sure the right jar is in classpath.
--Siva
On Fri, Jun 17, 2016 at 4:11 PM, Marco Mistroni <mmistr...@gmail.com> wrote:
> and your eclipse path is correct?
> i suggest, as Siva did before, to build your jar an
Try to import the class and see if you are getting compilation error
import com.databricks.spark.xml
Siva
On Fri, Jun 17, 2016 at 4:02 PM, VG <vlin...@gmail.com> wrote:
> nopes. eclipse.
>
>
> On Fri, Jun 17, 2016 at 3:58 PM, Siva A <siva9940261...@gmail.com> wrote:
If you are running from IDE, Are you using Intellij?
On Fri, Jun 17, 2016 at 3:20 PM, Siva A <siva9940261...@gmail.com> wrote:
> Can you try to package as a jar and run using spark-submit
>
> Siva
>
> On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote:
>
Can you try to package as a jar and run using spark-submit
Siva
On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote:
> I am trying to run from IDE and everything else is working fine.
> I added spark-xml jar and now I ended up into this dependency
>
> 6/0
If its not working,
Add the package list while executing spark-submit/spark-shell like below
$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-xml_2.10:0.3.3
$SPARK_HOME/bin/spark-submit --packages com.databricks:spark-xml_2.10:0.3.3
On Fri, Jun 17, 2016 at 2:56 PM, Siva
Just try to use "xml" as format like below,
SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.read()
.format("xml")
.option("rowTag", "row")
.load("A.xml");
FYR: https://gi
e changes will break
> Java serialization.
>
> On Mon, Apr 11, 2016 at 4:30 PM, Siva Gudavalli <gss.su...@gmail.com>
> wrote:
>
>> hello,
>>
>> i am writing a spark streaming application to read data from kafka. I am
>> using no receiver approach and
hello,
i am writing a spark streaming application to read data from kafka. I am
using no receiver approach and enabled checkpointing to make sure I am not
reading messages again in case of failure. (exactly once semantics)
i have a quick question how checkpointing needs to be configured to
has been provided to all
> the executors in your cluster. Most of the class not found errors got
> resolved for me after making required jars available in the SparkContext.
>
> Thanks.
>
> From: Ted Yu <yuzhih...@gmail.com>
> Date: Saturday, 12 March 2016 at 7:17 AM
&g
Hi Everyone,
All of sudden we are encountering the below error from one of the spark
consumer. It used to work before without any issues.
When I restart the consumer with latest offsets, it is working fine for
sometime (it executed few batches) and it fails again, this issue is
intermittent.
t;
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Siva [mailto:sbhavan...@gmail.com]
> *Sent:* Friday, January 29, 2016 5:40 PM
> *To:* Mohammed Guller
>
Hi Everyone,
We are using spark 1.4.1 and we have a requirement of writing data local fs
instead of hdfs.
When trying to save rdd to local fs with saveAsTextFile, it is just writing
_SUCCESS file in the folder with no part- files and also no error or
warning messages on console.
Is there any
r you running Spark on a single machine?
>
>
>
> You can change Spark’s logging level to INFO or DEBUG to see what is going
> on.
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/&g
Hi Everyone,
Avro data written by dataframe in hdfs in not able to read by hive. Saving
data avro format with below statement.
df.save("com.databricks.spark.avro", SaveMode.Append, Map("path" -> path))
Created hive avro external table and while reading I see all nulls. Did
anyone face similar
Thanks a lot Saisai and Zhan, I see DefaultResourceCalculator currently
being used for Capacity scheduler. We will change it to
DominantResourceCalculator.
Thanks,
Sivakumar Bhavanari.
On Mon, Dec 21, 2015 at 5:56 PM, Zhan Zhang wrote:
> BTW: It is not only a Yarn-webui
Hi Everyone,
Observing a strange problem while submitting spark streaming job in
yarn-cluster mode through spark-submit. All the executors are using only 1
Vcore irrespective value of the parameter --executor-cores.
Are there any config parameters overrides --executor-cores value?
Thanks,
Hi Kalpseh,
Just to add, you could use "yarn logs -applicationId " to
see aggregated logs once application is finished.
Thanks,
Sivakumar Bhavanari.
On Mon, Dec 21, 2015 at 3:56 PM, Zhan Zhang wrote:
> Hi Kalpesh,
>
> If you are using spark on yarn, it may not work.
, Saisai Shao <sai.sai.s...@gmail.com> wrote:
> Hi Siva,
>
> How did you know that --executor-cores is ignored and where did you see
> that only 1 Vcore is allocated?
>
> Thanks
> Saisai
>
> On Tue, Dec 22, 2015 at 9:08 AM, Siva <sbhavan...@gmail.com> wrote:
>
Ref:https://issues.apache.org/jira/browse/SPARK-11953
In Spark 1.3.1 we have 2 methods i.e.. CreateJdbcTable and InsertIntoJdbc
They are replaced with write.jdbc() in Spark 1.4.1
CreateJDBCTable allows to perform CREATE TABLE ... i.e... DDL on the table
followed by INSERT (DML)
InsertIntoJDBC
Hi,
I am trying to write a dataframe from Spark 1.4.1 to oracle 11g
I am using
dataframe.write.mode(SaveMode.Append).jdbc(url,tablename, properties)
this is always trying to create a Table.
I would like to insert records to an existing table instead of creating a
new one each single time.
Hi,
Could someone recommend the monitoring tools for spark streaming?
By extending StreamingListener we can dump the delay in processing of
batches and some alert messages.
But are there any Web UI tools where we can monitor failures, see delays in
processing, error messages and setup alerts
r$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/09/20 22:39:10 WARN TaskSetManager: Lost task 0.0 in stage 14.0 (TID 16,
localhost): java.lang.RuntimeException: hbase-default.xml file seems to be
for and old version of HBase (null), this version is
0.98.4.2.2.4.2-2-hadoop2
Thanks,
Siva.
I want to program in scala for spark.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Eclipse-IDE-Maven-tp23977p23981.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
.
Thanks
Siva
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Eclipse-IDE-Maven-tp23977.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e
http://www.meetup.com/Bay-Area-Stream-Processing/events/219086133/
Thursday, June 4, 2015
6:45 PM
TubeMogul
http://maps.google.com/maps?f=qhl=enq=1250+53rd%2C+Emeryville%2C+CA%2C+94608%2C+us
1250 53rd
St #1
Emeryville, CA
6:45PM to 7:00PM - Socializing
7:00PM to 8:00PM - Talks
8:00PM to
/218816482/?action=detaileventId=218816482
We meet every month in East Bay (Emeryville, CA). I am looking for someone
to give a talk about Spark for the next meetup (Feb 5th)
Let me know if you are interested in giving a talk.
Thanks,
-- Siva Jagadeesan
/218816482/?action=detaileventId=218816482
We meet every month in East Bay (Emeryville, CA). I am looking for someone
to give a talk about Spark for the next meetup (Feb 5th)
Let me know if you are interested in giving a talk.
Thanks,
-- Siva Jagadeesan
Hi All,
I am new to Spark and running pi example on Yarn Cluster. I am getting the
following exception
Exception in thread main java.lang.NullPointerException
at
scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114)
at
50 matches
Mail list logo