about side effects to any application including spark (memory consumption
etc).
On 21 Nov 2016, at 18:26, Samy Dindane <s...@dindane.com> wrote:
Hi,
I'd like to extend the file:// file system and add some custom logic to the API
that lists files.
I think I need to extend File
Hi,
I'd like to extend the file:// file system and add some custom logic to the API
that lists files.
I think I need to extend FileSystem or LocalFileSystem from
org.apache.hadoop.fs, but I am not sure how to go about it exactly.
How to write a custom file system and make it usable by Spark?
, you'll always get the latest version.
Daniel
On Thu, Nov 17, 2016 at 9:05 PM, Samy Dindane <s...@dindane.com
<mailto:s...@dindane.com>> wrote:
Hi,
I have some data partitioned this way:
/data/year=2016/month=9/version=0
/data/year=2016/month=10/version=0
/data/year=2
Hi,
I have some data partitioned this way:
/data/year=2016/month=9/version=0
/data/year=2016/month=10/version=0
/data/year=2016/month=10/version=1
/data/year=2016/month=10/version=2
/data/year=2016/month=10/version=3
/data/year=2016/month=11/version=0
/data/year=2016/month=11/version=1
When
Hi,
In order to impersonate a user when submitting a job with `spark-submit`, the
`proxy-user` option is used.
Is there a similar feature when running a job inside a Scala program? Maybe by
specifying some configuration value?
Thanks.
Samy
number does
not change.
BTW, how many kafka partitions are you using, and how many actually
have data for a given batch?
3 partitions.
All of them have more than maxRatePerPartition records (my topic has hundred of
millions of records).
On Thu, Oct 13, 2016 at 4:33 AM, Samy Dindane &l
This partially answers the question: http://stackoverflow.com/a/35449563/604041
On 10/04/2016 03:10 PM, Samy Dindane wrote:
Hi,
I have the following schema:
-root
|-timestamp
|-date
|-year
|-month
|-day
|-some_column
|-some_other_column
I'd like to achieve either of these:
1
fe.org/e730492453.png
notice the cutover point
On Wed, Oct 12, 2016 at 11:00 AM, Samy Dindane <s...@dindane.com> wrote:
I am 100% sure.
println(conf.get("spark.streaming.backpressure.enabled")) prints true.
On 10/12/2016 05:48 PM, Cody Koeninger wrote:
Just to make 100% sure, d
Hi,
I'd like a specific job to fail if there's another instance of it already
running on the cluster (Spark Standalone in my case).
How to achieve this?
Thank you.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
I am 100% sure.
println(conf.get("spark.streaming.backpressure.enabled")) prints true.
On 10/12/2016 05:48 PM, Cody Koeninger wrote:
Just to make 100% sure, did you set
spark.streaming.backpressure.enabled
to true?
On Wed, Oct 12, 2016 at 10:09 AM, Samy Dindane <s...@dinda
Koeninger wrote:
http://spark.apache.org/docs/latest/configuration.html
"This rate is upper bounded by the values
spark.streaming.receiver.maxRate and
spark.streaming.kafka.maxRatePerPartition if they are set (see
below)."
On Tue, Oct 11, 2016 at 10:57 AM, Samy Dindane <s...@
rite fileA and fileB, because they already have
correct data and offsets. You just write fileC.
Then once youve recovered you go on about your job as normal, starting
at topic-0 offsets 60, topic-1 offsets 66
Clear as mud?
On Mon, Oct 10, 2016 at 5:36 PM, Samy Dindane <s...@dindane.com> wr
Hi,
Is it possible to limit the size of the batches returned by the Kafka consumer
for Spark Streaming?
I am asking because the first batch I get has hundred of millions of records
and it takes ages to process and checkpoint them.
Thank you.
Samy
preciate your help. Thanks a lot.
On Mon, Oct 10, 2016 at 12:12 PM, Samy Dindane <s...@dindane.com> wrote:
I just noticed that you're the author of the code I linked in my previous
email. :) It's helpful.
When using `foreachPartition` or `mapPartitions`, I noticed I can't ask
Spark to write t
you.
Samy
On 10/10/2016 04:58 PM, Samy Dindane wrote:
Hi Cody,
I am writing a spark job that reads records from a Kafka topic and writes them
on the file system.
This would be straightforward if it weren't for the custom checkpointing logic
I want to have; Spark's checkpointing doesn't suit us
15 matches
Mail list logo