Error when trying to get the data from Hive Materialized View

2023-10-21 Thread Siva Sankar Reddy
Hi Team , We are not getting any error when retrieving the data from hive table in PYSPARK , but getting the error ( Scala.matcherror MATERIALIZED_VIEW ( of class org.Apache.Hadoop.hive.metastore.TableType ) . Please let me know resolution for this ? Thanks

Re: add an auto_increment column

2022-02-06 Thread Siva Samraj
Monotonically_increasing_id() will give the same functionality On Mon, 7 Feb, 2022, 6:57 am , wrote: > For a dataframe object, how to add a column who is auto_increment like > mysql's behavior? > > Thank you. > > - > To

Spark - ElasticSearch Integration

2021-11-22 Thread Siva Samraj
(dfKafkaPayload.select("value").as[String]).schema But while executing the same via Spark Streaming Job, we cannot do the above since streaming can have only on Action. Please let me know. Thanks Siva

Scheduling Time > Processing Time

2021-06-20 Thread Siva Tarun Ponnada
Hi Team, I have a spark streaming job which I am running in a single node cluster. I often see the schedulingTime > Processing Time in streaming statistics after a few minutes of my application startup. What does that mean? Should I increase the no:of receivers? Regards Taun

Re: Spark Streaming ElasticSearch

2020-10-05 Thread Siva Samraj
Hi Jainshasha, I need to read each row from Dataframe and made some changes to it before inserting it into ES. Thanks Siva On Mon, Oct 5, 2020 at 8:06 PM jainshasha wrote: > Hi Siva > > To emit data into ES using spark structured streaming job you need to used > ElasticSearch j

Spark Streaming ElasticSearch

2020-10-05 Thread Siva Samraj
Hi Team, I have a spark streaming job, which will read from kafka and write into elastic via Http request. I want to validate each request from Kafka and change the payload as per business need and write into Elastic Search. I have used ES Http Request to push the data into Elastic Search. Can

Offset Management in Spark

2020-09-30 Thread Siva Samraj
Hi all, I am using Spark Structured Streaming (Version 2.3.2). I need to read from Kafka Cluster and write into Kerberized Kafka. Here I want to use Kafka as offset checkpointing after the record is written into Kerberized Kafka. Questions: 1. Can we use Kafka for checkpointing to manage offset

Re: Spark structural streaming sinks output late

2020-03-28 Thread Siva Samraj
Yes, I am also facing the same issue. Did you figured out? On Tue, 9 Jul 2019, 7:25 pm Kamalanathan Venkatesan, < kamalanatha...@in.ey.com> wrote: > Hello, > > > > I have below spark structural streaming code and I was expecting the > results to be printed on the console every 10 seconds. But, I

Spark Streaming Code

2020-03-28 Thread Siva Samraj
Hi Team, Need help on windowing & watermark concept. This code is not working as expected. package com.jiomoney.streaming import org.apache.spark.sql.SparkSession import org.apache.spark.sql.functions._ import org.apache.spark.sql.streaming.ProcessingTime object SlingStreaming { def

Re: Spark Streaming

2018-11-26 Thread Siva Samraj
ect statement. If I'm not mistaken, it is known > as a bit costly since each call would produce a new Dataset. Defining > schema and using "from_json" will eliminate all the call of withColumn"s" > and extra calls of "get_json_object". > > - Jungtaek

Spark Streaming

2018-11-26 Thread Siva Samraj
Hello All, I am using Spark 2.3 version and i am trying to write Spark Streaming Join. It is a basic join and it is taking more time to join the stream data. I am not sure any configuration we need to set on Spark. Code: * import org.apache.spark.sql.SparkSession import

: Failed to create file system watcher service: User limit of inotify instances reached or too many open files

2018-08-22 Thread Polisetti, Venkata Siva Rama Gopala Krishna
Hi, When I am doing calculations for example 700 listID's it is saving only some 50 rows and then getting some random exceptions Getting below exception when I try to do calculations on huge data and try to save huge data . Please let me know if any suggestions. Sample Code : I have some

java.nio.file.FileSystemException: /tmp/spark- .._cache : No space left on device

2018-08-17 Thread Polisetti, Venkata Siva Rama Gopala Krishna
Hi Am getting below exception when I Run Spark-submit in linux machine , can someone give quick solution with commands Driver stacktrace: - Job 0 failed: count at DailyGainersAndLosersPublisher.scala:145, took 5.749450 s org.apache.spark.SparkException: Job aborted due to stage failure: Task 4

Re: Not able to overwrite cassandra table using Spark

2018-06-27 Thread Siva Samraj
You can try with this, it will work val finaldf = merchantdf.write. format("org.apache.spark.sql.cassandra") .mode(SaveMode.Overwrite) .option("confirm.truncate", true) .options(Map("table" -> "tablename", "keyspace" -> "keyspace")) .save() On Wed 27 Jun,

Scala Partition Question

2018-06-12 Thread Polisetti, Venkata Siva Rama Gopala Krishna
hello, Can I do complex data manipulations inside groupby function.? i.e. I want to group my whole dataframe by a column and then do some processing for each group. The information contained in this message is intended only for the recipient, and may be a

Re: Spark Streaming Small files in Hive

2017-10-29 Thread Siva Gudavalli
Hello Asmath, We had a similar challenge recently. When you write back to hive, you are creating files on HDFS, and it depends on your batch window. If you increase your batch window lets say from 1 min to 5 mins you will end up creating 5x times less. The other factor is your partitioning.

Re: Orc predicate pushdown with Spark Sql

2017-10-27 Thread Siva Gudavalli
t it reads > the file, but it should not read all the content, which is probably also not > happening. > > On 24. Oct 2017, at 18:16, Siva Gudavalli <gudavalli.s...@yahoo.com.INVALID > <mailto:gudavalli.s...@yahoo.com.INVALID>> wrote: > >> >> Hello, >> &

Re: Orc predicate pushdown with Spark Sql

2017-10-24 Thread Siva Gudavalli
92 DESC], output=[id#192]) +- ConvertToSafe +- Project [id#192] +- Filter (usr#199 = AA0YP) +- HiveTableScan [id#192,usr#199], MetastoreRelation default, hlogsv5, None, [(cdt#189 = 20171003),(usrpartkey#191 = hhhUsers)]   please let me know if i am missing anything here. thank you On Monday,

Orc predicate pushdown with Spark Sql

2017-10-23 Thread Siva Gudavalli
Hello, I am working with Spark SQL to query Hive Managed Table (in Orc Format) I have my data organized by partitions and asked to set indexes for each 50,000 Rows by setting ('orc.row.index.stride'='5') lets say -> after evaluating partition there are around 50 files in which data is

Partition and Sort by together

2017-10-12 Thread Siva Gudavalli
Hello, I have my data stored in parquet file format. My data Is already partitioned by dates and keyNow I want my data in each file to be sorted by a new Code column.  date1    -> key1             -> paqfile1             ->paqfile2     ->key2             ->paqfile1             ->paqfile2 date2 

Spark SQL Parallelism - While reading from Oracle

2016-08-10 Thread Siva A
operation using only one task. I couldn't increase the parallelism. Thanks in advance Thanks Siva

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
Use Spark XML version,0.3.3 com.databricks spark-xml_2.10 0.3.3 On Fri, Jun 17, 2016 at 4:25 PM, VG <vlin...@gmail.com> wrote: > Hi Siva > > This is what i have for jars. Did you manage to run with these or > different versions ? > > > > org.apache.s

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
Hi Marco, I did run in IDE(Intellij) as well. It works fine. VG, make sure the right jar is in classpath. --Siva On Fri, Jun 17, 2016 at 4:11 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > and your eclipse path is correct? > i suggest, as Siva did before, to build your jar an

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
Try to import the class and see if you are getting compilation error import com.databricks.spark.xml Siva On Fri, Jun 17, 2016 at 4:02 PM, VG <vlin...@gmail.com> wrote: > nopes. eclipse. > > > On Fri, Jun 17, 2016 at 3:58 PM, Siva A <siva9940261...@gmail.com> wrote:

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
If you are running from IDE, Are you using Intellij? On Fri, Jun 17, 2016 at 3:20 PM, Siva A <siva9940261...@gmail.com> wrote: > Can you try to package as a jar and run using spark-submit > > Siva > > On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote: >

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
Can you try to package as a jar and run using spark-submit Siva On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote: > I am trying to run from IDE and everything else is working fine. > I added spark-xml jar and now I ended up into this dependency > > 6/0

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
If its not working, Add the package list while executing spark-submit/spark-shell like below $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-xml_2.10:0.3.3 $SPARK_HOME/bin/spark-submit --packages com.databricks:spark-xml_2.10:0.3.3 On Fri, Jun 17, 2016 at 2:56 PM, Siva

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
Just try to use "xml" as format like below, SQLContext sqlContext = new SQLContext(sc); DataFrame df = sqlContext.read() .format("xml") .option("rowTag", "row") .load("A.xml"); FYR: https://gi

Re: how to deploy new code with checkpointing

2016-04-11 Thread Siva Gudavalli
e changes will break > Java serialization. > > On Mon, Apr 11, 2016 at 4:30 PM, Siva Gudavalli <gss.su...@gmail.com> > wrote: > >> hello, >> >> i am writing a spark streaming application to read data from kafka. I am >> using no receiver approach and

how to deploy new code with checkpointing

2016-04-11 Thread Siva Gudavalli
hello, i am writing a spark streaming application to read data from kafka. I am using no receiver approach and enabled checkpointing to make sure I am not reading messages again in case of failure. (exactly once semantics) i have a quick question how checkpointing needs to be configured to

Re: Spark Streaming: java.lang.NoClassDefFoundError: org/apache/kafka/common/message/KafkaLZ4BlockOutputStream

2016-03-11 Thread Siva
has been provided to all > the executors in your cluster. Most of the class not found errors got > resolved for me after making required jars available in the SparkContext. > > Thanks. > > From: Ted Yu <yuzhih...@gmail.com> > Date: Saturday, 12 March 2016 at 7:17 AM &g

Spark Streaming: java.lang.NoClassDefFoundError: org/apache/kafka/common/message/KafkaLZ4BlockOutputStream

2016-03-11 Thread Siva
Hi Everyone, All of sudden we are encountering the below error from one of the spark consumer. It used to work before without any issues. When I restart the consumer with latest offsets, it is working fine for sometime (it executed few batches) and it fails again, this issue is intermittent.

Re: saveAsTextFile is not writing to local fs

2016-02-01 Thread Siva
t; > > Mohammed > > Author: Big Data Analytics with Spark > <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/> > > > > *From:* Siva [mailto:sbhavan...@gmail.com] > *Sent:* Friday, January 29, 2016 5:40 PM > *To:* Mohammed Guller >

saveAsTextFile is not writing to local fs

2016-01-29 Thread Siva
Hi Everyone, We are using spark 1.4.1 and we have a requirement of writing data local fs instead of hdfs. When trying to save rdd to local fs with saveAsTextFile, it is just writing _SUCCESS file in the folder with no part- files and also no error or warning messages on console. Is there any

Re: saveAsTextFile is not writing to local fs

2016-01-29 Thread Siva
r you running Spark on a single machine? > > > > You can change Spark’s logging level to INFO or DEBUG to see what is going > on. > > > > Mohammed > > Author: Big Data Analytics with Spark > <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/&g

Hive is unable to avro file written by spark avro

2016-01-13 Thread Siva
Hi Everyone, Avro data written by dataframe in hdfs in not able to read by hive. Saving data avro format with below statement. df.save("com.databricks.spark.avro", SaveMode.Append, Map("path" -> path)) Created hive avro external table and while reading I see all nulls. Did anyone face similar

Re: spark-submit is ignoring "--executor-cores"

2015-12-22 Thread Siva
Thanks a lot Saisai and Zhan, I see DefaultResourceCalculator currently being used for Capacity scheduler. We will change it to DominantResourceCalculator. Thanks, Sivakumar Bhavanari. On Mon, Dec 21, 2015 at 5:56 PM, Zhan Zhang wrote: > BTW: It is not only a Yarn-webui

spark-submit is ignoring "--executor-cores"

2015-12-21 Thread Siva
Hi Everyone, Observing a strange problem while submitting spark streaming job in yarn-cluster mode through spark-submit. All the executors are using only 1 Vcore irrespective value of the parameter --executor-cores. Are there any config parameters overrides --executor-cores value? Thanks,

Re: Spark with log4j

2015-12-21 Thread Siva
Hi Kalpseh, Just to add, you could use "yarn logs -applicationId " to see aggregated logs once application is finished. Thanks, Sivakumar Bhavanari. On Mon, Dec 21, 2015 at 3:56 PM, Zhan Zhang wrote: > Hi Kalpesh, > > If you are using spark on yarn, it may not work.

Re: spark-submit is ignoring "--executor-cores"

2015-12-21 Thread Siva
, Saisai Shao <sai.sai.s...@gmail.com> wrote: > Hi Siva, > > How did you know that --executor-cores is ignored and where did you see > that only 1 Vcore is allocated? > > Thanks > Saisai > > On Tue, Dec 22, 2015 at 9:08 AM, Siva <sbhavan...@gmail.com> wrote: >

Spark sql-1.4.1 DataFrameWrite.jdbc() SaveMode.Append

2015-11-24 Thread Siva Gudavalli
Ref:https://issues.apache.org/jira/browse/SPARK-11953 In Spark 1.3.1 we have 2 methods i.e.. CreateJdbcTable and InsertIntoJdbc They are replaced with write.jdbc() in Spark 1.4.1 CreateJDBCTable allows to perform CREATE TABLE ... i.e... DDL on the table followed by INSERT (DML) InsertIntoJDBC

spark 1.4.1 to oracle 11g write to an existing table

2015-11-23 Thread Siva Gudavalli
Hi, I am trying to write a dataframe from Spark 1.4.1 to oracle 11g I am using dataframe.write.mode(SaveMode.Append).jdbc(url,tablename, properties) this is always trying to create a Table. I would like to insert records to an existing table instead of creating a new one each single time.

Monitoring tools for spark streaming

2015-09-28 Thread Siva
Hi, Could someone recommend the monitoring tools for spark streaming? By extending StreamingListener we can dump the delay in processing of batches and some alert messages. But are there any Web UI tools where we can monitor failures, see delays in processing, error messages and setup alerts

Hbase Spark streaming issue.

2015-09-21 Thread Siva
r$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/09/20 22:39:10 WARN TaskSetManager: Lost task 0.0 in stage 14.0 (TID 16, localhost): java.lang.RuntimeException: hbase-default.xml file seems to be for and old version of HBase (null), this version is 0.98.4.2.2.4.2-2-hadoop2 Thanks, Siva.

Re: Spark - Eclipse IDE - Maven

2015-07-24 Thread Siva Reddy
I want to program in scala for spark. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Eclipse-IDE-Maven-tp23977p23981.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Spark - Eclipse IDE - Maven

2015-07-23 Thread Siva Reddy
. Thanks Siva -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Eclipse-IDE-Maven-tp23977.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e

SF / East Bay Area Stream Processing Meetup next Thursday (6/4)

2015-05-27 Thread Siva Jagadeesan
http://www.meetup.com/Bay-Area-Stream-Processing/events/219086133/ Thursday, June 4, 2015 6:45 PM TubeMogul http://maps.google.com/maps?f=qhl=enq=1250+53rd%2C+Emeryville%2C+CA%2C+94608%2C+us 1250 53rd St #1 Emeryville, CA 6:45PM to 7:00PM - Socializing 7:00PM to 8:00PM - Talks 8:00PM to

Announcing SF / East Bay Area Stream Processing Meetup

2015-01-21 Thread Siva Jagadeesan
/218816482/?action=detaileventId=218816482 We meet every month in East Bay (Emeryville, CA). I am looking for someone to give a talk about Spark for the next meetup (Feb 5th) Let me know if you are interested in giving a talk. Thanks, -- Siva Jagadeesan

Announcing SF / East Bay Area Stream Processing Meetup

2015-01-21 Thread Siva Jagadeesan
/218816482/?action=detaileventId=218816482 We meet every month in East Bay (Emeryville, CA). I am looking for someone to give a talk about Spark for the next meetup (Feb 5th) Let me know if you are interested in giving a talk. Thanks, -- Siva Jagadeesan

exception while running pi example on yarn cluster

2014-03-08 Thread Venkata siva kamesh Bhallamudi
Hi All, I am new to Spark and running pi example on Yarn Cluster. I am getting the following exception Exception in thread main java.lang.NullPointerException at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114) at