to surface the problem.
Can someone review the code and tell me if I am doing something wrong?
regards
Sunita
for failed tasks were done, other tasks completed.
You can set it to higher or lower value depending on how many more tasks
you have and the duration they take to complete.
regards
Sunita
On Fri, Nov 13, 2015 at 4:50 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> I searched the code base and
(UnsupportedOperationChecker.scala:297)
regards
Sunita
On Mon, Sep 18, 2017 at 10:15 AM, Michael Armbrust <mich...@databricks.com>
wrote:
> You specify the schema when loading a dataframe by calling
> spark.read.schema(...)...
>
> On Tue, Sep 12, 2017 at 4:50 PM, Sunita Arvind <
usecase.
Is there a way to change the owner of files written by Spark?
regards
Sunita
> Le 13 sept. 2017 01:51, "Sunita Arvind" <sunitarv...@gmail.com> a écrit :
>
> Hi Michael,
>
> I am wondering what I am doing wrong. I get error like:
>
> Exception in thread "main" java.lang.IllegalArgumentException: Schema
> must be specified when
e.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:278)
at
org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:282)
at
org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:222)
While running on the EMR cluster all paths poi
Thanks for your response Praneeth. We did consider Kafka however cost was
the only hold back factor as we might need a larger cluster and existing
cluster is on premise and my app is on cloud. So the same cluster cannot be
used.
But I agree it does sound like a good alternative.
Regards
Sunita
Thanks for your response Michael
Will try it out.
Regards
Sunita
On Wed, Aug 23, 2017 at 2:30 PM Michael Armbrust <mich...@databricks.com>
wrote:
> If you use structured streaming and the file sink, you can have a
> subsequent stream read using the file source. This will main
to be error prone. When either of the jobs
get delayed due to bursts or any error/exception this could lead to huge
data losses and non-deterministic behavior . What are other alternatives to
this?
Appreciate any guidance in this regard.
regards
Sunita Koppar
as parquet with null in the numeric fields.
Is there a workaround to it? I need to be able to allow null values
for numeric fields
Thanks in advance.
regards
Sunita
sure I am not doing an overkill or overseeing a
potential issue.
regards
Sunita
On Tue, Oct 25, 2016 at 2:38 PM, Sunita Arvind <sunitarv...@gmail.com>
wrote:
> The error in the file I just shared is here:
>
> val partitionOffsetPath:String = topicDirs.consumerOffsetDir + "/&q
Thanks for the response Sean. I have seen the NPE on similar issues very
consistently and assumed that could be the reason :) Thanks for clarifying.
regards
Sunita
On Tue, Oct 25, 2016 at 10:11 PM, Sean Owen <so...@cloudera.com> wrote:
> This usage is fine, because you are o
create the dataframe in main, you can register it as a table and
run the queries in main method itself. You don't need to coalesce or run
the method within foreach.
Regards
Sunita
On Tuesday, October 25, 2016, Ajay Chander <itsche...@gmail.com> wrote:
>
> Jeff, Thanks for your response.
eeper")
df.saveAsParquetFile(conf.getString("ParquetOutputPath")+offsetSaved)
LogHandler.log.info("Created the parquet file")
}
Thanks
Sunita
On Tue, Oct 25, 2016 at 2:11 PM, Sunita Arvind <sunitarv...@gmail.com>
wrote:
> Attached is the edi
Sunita
On Tue, Oct 25, 2016 at 1:52 PM, Sunita Arvind <sunitarv...@gmail.com>
wrote:
> Thanks for confirming Cody.
> To get to use the library, I had to do:
>
> val offsetsStore = new ZooKeeperOffsetsStore(conf.getString("zkHosts"),
> "/consumers/topics/"+ t
I want the library to pick all the partitions for a topic, without me
specifying the path, is it possible out of the box or I need to tweak?
regards
Sunita
On Tue, Oct 25, 2016 at 12:08 PM, Cody Koeninger <c...@koeninger.org> wrote:
> You are correct that you shouldn't have to worry about broker id.
Just re-read the kafka architecture. Something that slipped my mind is, it
is leader based. So topic/partitionId pair will be same on all the brokers.
So we do not need to consider brokerid while storing offsets. Still
exploring rest of the items.
regards
Sunita
On Tue, Oct 25, 2016 at 11:09 AM
not considering brokerIds which
storing offsets and probably the OffsetRanges does not have it either. It
can only provide Topic, partition, from and until offsets.
I am probably missing something very basic. Probably the library works well
by itself. Can someone/ Cody explain?
Cody, Thanks a lot for
Hello Experts,
Is there a way to get spark to write to elasticsearch asynchronously?
Below are the details
http://stackoverflow.com/questions/39624538/spark-savetoes-asynchronously
regards
Sunita
interesting
observation is, bringing down the executor memory to 5GB with executor
memoryOverhead to 768 showed significant performance gains. What are the
other associated settings?
regards
Sunita
Thank you for your inputs. Will test it out and share my findings
On Thursday, July 14, 2016, CosminC wrote:
> Didn't have the time to investigate much further, but the one thing that
> popped out is that partitioning was no longer working on 1.6.1. This would
> definitely
I am facing the same issue. Upgrading to Spark1.6 is causing hugh performance
loss. Could you solve this issue? I am also attempting memory settings as
mentioned
http://spark.apache.org/docs/latest/configuration.html#memory-management
But its not making a lot of difference. Appreciate your inputs
trying to figure out if I can use the
(iterator: Iterator[(K, Seq[V], Option[S])]) but haven't figured it out yet.
Appreciate any suggestions in this regard.
regards
Sunita
P.S:
I am aware of mapwithState but not on the latest version as of now.
distribution data sets. Mentioning it here for
benefit of anyone else stumbling upon the same issue.
regards
Sunita
On Wed, Jun 22, 2016 at 8:20 PM, Sunita Arvind <sunitarv...@gmail.com>
wrote:
> Hello Experts,
>
> I am getting this error repeatedly:
>
> 16
r.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 38 more
16/06/23 11:09:53 INFO SparkContext: Invoking stop() from shutdown hook
I've tried kafka version 0.8.2.0, 0.8.2.2, 0.9.0.0. With 0.9.0.0 the
processing hangs much sooner.
Can someone help with this error?
rega
ssc.awaitTermination()
}
}
}
I also tried putting all the initialization directly in main (not using
method calls for initializeSpark and createDataStreamFromKafka) and also
not putting in foreach and creating a single spark and streaming context.
However, the error persists.
Appreciate any help.
regards
Sunita
or do I need to have HiveContext in order to see
the tables registered via Spark application through the JDBC?
regards
Sunita
Thanks for the clarification Michael and good luck with Spark 2.0. It
really looks promising.
I am especially interested in adhoc queries aspect. Probably that is what
is being referred to as Continuous SQL in the slides. What is the timeframe
for availability this functionality?
regards
Sunita
in 2.1 or later only
regards
Sunita
On Fri, May 6, 2016 at 1:06 PM, Michael Malak <michaelma...@yahoo.com>
wrote:
> At first glance, it looks like the only streaming data sources available
> out of the box from the github master branch are
> https://github.com/apache/spark/blob/mast
to
ensure it works for our use cases.
Can someone point me to relevant material for this.
regards
Sunita
of years is 10
Within 10 years is true
()
Appreciate any direction from the community.
regards
Sunita
Exception in thread main scala.reflect.internal.MissingRequirementError:
class org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with
primordial classloader with boot classpath
[C
Exchange (HashPartitioning [education#18], 200)
ParquetTableScan [education#18,education_desc#19], (ParquetRelation
C:/Sunita/eclipse/workspace/branch/trial/plsresources/plsbuyer/cg_pq_cdw_education,
Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, yarn
of effort for us to try this approach and weight the
performance as we need to register the output as tables to proceed using
them. Hence would appreciate inputs from the community before proceeding.
Regards
Sunita Koppar
I was able to resolve this by adding rdd.collect() after every stage. This
enforced RDD evaluation and helped avoid the choke point.
regards
Sunita Kopppar
On Sat, Jan 17, 2015 at 12:56 PM, Sunita Arvind sunitarv...@gmail.com
wrote:
Hi,
My spark jobs suddenly started getting hung and here
names. The spark sql wiki has good examples for this. Looks more
easy to manage to me than your solution below.
Agree with you on the fact that when there are lot of columns,
row.getString() even once is not convenient
Regards
Sunita
On Tuesday, January 20, 2015, Night Wolf nightwolf...@gmail.com
will edit it and post.
regards
Sunita
(SQLContext.scala:94)
at croevss.StageJoin$.vsswf(StageJoin.scala:162)
at croevss.StageJoin$.main(StageJoin.scala:41)
at croevss.StageJoin.main(StageJoin.scala)
regards
Sunita Koppar
)
regards
Sunita
On Tue, Nov 25, 2014 at 11:47 PM, Sameer Farooqui same...@databricks.com
wrote:
Hi Sunita,
This gitbook may also be useful for you to get Spark running in local mode
on your Windows machine:
http://blueplastic.gitbooks.io/how-to-light-your-spark-on-a-stick/content/
On Tue, Nov 25
your help.
regards
Sunita
-tolerance. mean that
GraphX makes the typical RDBMS operations possible even when the data is
persisted in a GDBMS and not viceversa?
regards
Sunita
-started-with-spark-deploy-spark-server-and-compute-pi-from-your-web-browser/
Romain
Romain
On Tue, Jun 24, 2014 at 9:04 AM, Sunita Arvind sunitarv...@gmail.com
javascript:_e(%7B%7D,'cvml','sunitarv...@gmail.com'); wrote:
Hello Experts,
I am attempting to integrate Spark Editor with Hue
41 matches
Mail list logo