+1. It sounds awesome!
Kiran Kumar Dusi 于2024年3月21日周四 14:16写道:
> +1
>
> On Thu, 21 Mar 2024 at 7:46 AM, Farshid Ashouri <
> farsheed.asho...@gmail.com> wrote:
>
>> +1
>>
>> On Mon, 18 Mar 2024, 11:00 Mich Talebzadeh,
>> wrote:
>>
>>> Some of you may be aware that Databricks community Home |
wrote:
>>
>>> Did you check if mapreduce.fileoutputcommitter.algorithm.version 2 is
>>> supported on GCS? IIRC it wasn't, but you could check with GCP support
>>>
>>>
>>> On Mon, Jul 17, 2023 at 3:54 PM Dipayan Dev
>>> wr
You can try increasing fs.gs.batch.threads and fs.gs.max.requests.per.batch.
The definitions for these flags are available here -
https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/CONFIGURATION.md
On Mon, 17 Jul 2023 at 14:59, Dipayan Dev wrote:
> No, I am using Spark
I think I found the issue, Hive metastore 2.3.6 doesn't have the necessary
support. After upgrading to Hive 3.1.2 I was able to run the select query.
On Sun, 20 Dec 2020 at 12:00, Jay wrote:
> Thanks Matt.
>
> I have set the two configs in my sparkConfig as below
&g
uot;SELECT * FROM $tableName")
res2: org.apache.spark.sql.DataFrame = [col1: int]
But when I try to do .show() it returns an error
scala> spark.sql(s"SELECT * FROM $tableName").show()
org.apache.spark.sql.AnalysisException: Table does not support reads:
default.tblname_3;
Hi All -
I have currently setup a Spark 3.0.1 cluster with delta version 0.7.0 which
is connected to an external hive metastore.
I run the below set of commands :-
val tableName = tblname_2
spark.sql(s"CREATE TABLE $tableName(col1 INTEGER) USING delta
options(path='GCS_PATH')")
*20/12/19
I am trying to integrate spark with hive on HDInsight spark cluster .
I copied hive-site.xml in spark/conf directory. In addition I added hive
metastore properties like jdbc connection info on Ambari as well. But still the
database and tables created using spark-sql are not visible in hive.
Are you running this in local mode or cluster mode ? If you are running in
cluster mode have you ensured that numpy is present on all nodes ?
On Tue 5 Jun, 2018, 2:43 AM @Nandan@,
wrote:
> Hi ,
> I am getting error :-
>
>
I might have missed it but can you tell if the OOM is happening in driver
or executor ? Also it would be good if you can post the actual exception.
On Tue 5 Jun, 2018, 1:55 PM Nicolas Paris, wrote:
> IMO your json cannot be read in parallell at all then spark only offers
> you
> to play again
The partitionBy clause is used to create hive folders so that you can point
a hive partitioned table on the data .
What are you using the partitionBy for ? What is the use case ?
On Mon 4 Jun, 2018, 4:59 PM purna pradeep, wrote:
> im reading below json in spark
>
> {"bucket": "B01",
Can you tell us what version of Spark you are using and if Dynamic
Allocation is enabled ?
Also, how are the files being read ? Is it a single read of all files using
a file matching regex or are you running different threads in the same
pyspark job?
On Mon 4 Jun, 2018, 1:27 PM Shuporno
Benjamin,
The append will append the "new" data to the existing data with removing
the duplicates. You would need to overwrite the file everytime if you need
unique values.
Thanks,
Jayadeep
On Fri, Jun 1, 2018 at 9:31 PM Benjamin Kim wrote:
> I have a situation where I trying to add only new
The code that you see in github is for version 2.1. For versions below that
the default partitions for binary files is set to 2 which you can change by
using the minPartitions value. I am not sure starting 2.1 how the
minPartitions column will work because as you said the field is completely
The code that you see in github is for version 2.1. For versions below that
the default partitions for binary files is set to 2 which you can change by
using the minPartitions value. I am not sure starting 2.1 how the
minPartitions column will work because as you said the field is completely
Hi,
The problem has been solved simply by updating the scala sdk version from
incompactible 2.10.x to correct version 2.11.x
From: Michael Jay <mich@outlook.com>
Sent: Tuesday, August 9, 2016 10:11:12 PM
To: user@spark.apache.org
Subject: spa
Dear all,
I am Newbie to Spark. Currently I am trying to import the source code of Spark
2.0 as a Module to an existing client project.
I have imported Spark-core, Spark-sql and Spark-catalyst as maven dependencies
in this client project.
During compilation errors as missing
arkContext.(JavaSparkContext.scala:58)
--
jay vyas
not too worries about this - but it seems like it might be nice if
maybe we could specify a user name as part of sparks context or as part of
an external parameter rather then having to
use the java based user/group extractor.
--
jay vyas
Thank you, that helps a lot.
On Mon, Feb 22, 2016 at 6:01 PM, Takeshi Yamamuro <linguin@gmail.com>
wrote:
> You're correct, reduceByKey is just an example.
>
> On Tue, Feb 23, 2016 at 10:57 AM, Jay Luan <jaylu...@gmail.com> wrote:
>
>> Could you elaborate on
Could you elaborate on how this would work?
So from what I can tell, this maps a key to a tuple which always has a 0 as
the second element. From there the hash widely changes because we now hash
something like ((1,4), 0) and ((1,3), 0). Thus mapping this would create
more even partitions. Why
Thanks for the reply, I'd like to export the decision splits for each tree
out to an external file which is read elsewhere not using spark. As far as
I know, saving a model to a path will save a bunch of binary files which
can be loaded back into spark. Is this correct?
On Feb 10, 2016 7:21 PM,
determine the root
cause, as I cannot replicate this issue.
Thanks for your help.
From: Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>>
Date: Friday, February 5, 2016 at 5:40 PM
To: Jay Shipper <shipper_...@bah.com<mailto:shipper_...@bah.com>>
Cc: "user
could
confirm they’re having this issue with Spark 1.6.0. Ideally, we should also
have some simple proof of concept that can be posted with the bug.
From: Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>>
Date: Wednesday, February 3, 2016 at 3:57 PM
To: Jay Shipper <ship
to
match what Spark 1.6.0 uses (2.6.0-cdh5.7.0-SNAPSHOT).
Thanks,
Jay
016 at 12:04 PM
To: Jay Shipper <shipper_...@bah.com<mailto:shipper_...@bah.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>"
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: [External] Re: Spark 1.6.0 HiveContext NPE
Looks li
One quick update on this: The NPE is not happening with Spark 1.5.2, so this
problem seems specific to Spark 1.6.0.
From: Jay Shipper <shipper_...@bah.com<mailto:shipper_...@bah.com>>
Date: Wednesday, February 3, 2016 at 12:06 PM
To: "user@spark.apache.org<mailto:user@spark
Ah, thank you so much, this is perfect
On Fri, Nov 20, 2015 at 3:48 PM, Ali Tajeldin EDU
wrote:
> You can try to use an Accumulator (
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.Accumulator)
> to keep count in map1. Note that the final
The answer is that my table was not serialized by kyro,but I started
spark-sql shell with kyro,so the data could not be deserialized。
--
View this message in context:
a producer and a consumer, so that you
don't get a starvation scenario.
On Wed, Aug 12, 2015 at 7:31 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
Is there a way to run spark streaming methods in standalone eclipse
environment to test out the functionality?
--
jay vyas
In general the simplest way is that you can use the Dynamo Java API as is and
call it inside a map(), and use the asynchronous put() Dynamo api call .
On Aug 7, 2015, at 9:08 AM, Yasemin Kaya godo...@gmail.com wrote:
Hi,
Is there a way using DynamoDB in spark application? I have to
, 2015 at 11:11 PM, Dogtail Ray spark.ru...@gmail.com wrote:
Hi,
I have modified some Hadoop code, and want to build Spark with the
modified version of Hadoop. Do I need to change the compilation dependency
files? How to then? Great thanks!
--
jay vyas
My spark-sql command:
spark-sql --driver-memory 2g --master spark://hadoop04.xx.xx.com:8241 --conf
spark.driver.cores=20 --conf spark.cores.max=20 --conf
spark.executor.memory=2g --conf spark.driver.memory=2g --conf
spark.akka.frameSize=500 --conf spark.eventLog.enabled=true --conf
For additional commands, e-mail: user-h...@spark.apache.org
--
jay vyas
/StreamingContext.html.
The information on consumed offset can be recovered from the checkpoint.
On Tue, May 19, 2015 at 2:38 PM, Bill Jay bill.jaypeter...@gmail.com
wrote:
If a Spark streaming job stops at 12:01 and I resume the job at 12:02.
Will it still start to consume the data that were
Hi all,
I am currently using Spark streaming to consume and save logs every hour in
our production pipeline. The current setting is to run a crontab job to
check every minute whether the job is still there and if not resubmit a
Spark streaming job. I am currently using the direct approach for
19, 2015 at 12:42 PM, Bill Jay bill.jaypeter...@gmail.com
wrote:
Hi all,
I am currently using Spark streaming to consume and save logs every hour
in our production pipeline. The current setting is to run a crontab job to
check every minute whether the job is still there and if not resubmit
Hi all,
I am reading the docs of receiver-based Kafka consumer. The last parameters
of KafkaUtils.createStream is per topic number of Kafka partitions to
consume. My question is, does the number of partitions for topic in this
parameter need to match the number of partitions in Kafka.
For
to do with it.
On Wed, Apr 29, 2015 at 6:50 PM, Bill Jay bill.jaypeter...@gmail.com
wrote:
This function is called in foreachRDD. I think it should be running in
the executors. I add the statement.close() in the code and it is running. I
will let you know if this fixes the issue.
On Wed
each batch.
On Thu, Apr 30, 2015 at 11:15 AM, Cody Koeninger c...@koeninger.org wrote:
Did you use lsof to see what files were opened during the job?
On Thu, Apr 30, 2015 at 1:05 PM, Bill Jay bill.jaypeter...@gmail.com
wrote:
The data ingestion is in outermost portion in foreachRDD block
:07 PM, Bill Jay bill.jaypeter...@gmail.com
wrote:
Hi all,
I am using the direct approach to receive real-time data from Kafka in
the following link:
https://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
My code follows the word count direct example:
https://github.com
Hi all,
I am using the direct approach to receive real-time data from Kafka in the
following link:
https://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
My code follows the word count direct example:
...@gmail.com wrote:
Maybe add statement.close() in finally block ?
Streaming / Kafka experts may have better insight.
On Wed, Apr 29, 2015 at 2:25 PM, Bill Jay bill.jaypeter...@gmail.com
wrote:
Thanks for the suggestion. I ran the command and the limit is 1024.
Based on my understanding
.
Please let me know If I am doing something wrong.
--
jay vyas
2.11.2.
Could it be that spark-1.3.0-bin-hadoop2.4.tgz requires a different version
of scala ?
Thanks,
Jay
On Apr 9, 2015, at 4:38 PM, Xiangrui Meng men...@gmail.com wrote:
Could you share ALSNew.scala? Which Scala version did you use? -Xiangrui
On Wed, Apr 8, 2015 at 4:09 PM, Jay
-submit from my downloaded
spark-1.30.
On Apr 6, 2015, at 1:37 PM, Jay Katukuri jkatuk...@apple.com wrote:
Here is the command that I have used :
spark-submit —class packagename.ALSNew --num-executors 100 --master yarn
ALSNew.jar -jar spark-sql_2.11-1.3.0.jar hdfs://input_path
Btw - I
-Xiangrui
On Mon, Apr 6, 2015 at 12:27 PM, Jay Katukuri jkatuk...@apple.com wrote:
Hi,
Here is the stack trace:
Exception in thread main java.lang.NoSuchMethodError:
scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
at ALSNew
= purchase.map ( line =
line.split(',') match { case Array(user, item, rate) =
(user.toInt, item.toInt, rate.toFloat)
}).toDF()
Any help is appreciated !
I have tried passing the spark-sql jar using the -jar spark-sql_2.11-1.3.0.jar
Thanks,
Jay
On Mar 17, 2015, at 12:50 PM
)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Thanks,
Jay
On Apr 6, 2015, at 12:24 PM, Xiangrui Meng men...@gmail.com wrote:
Please attach the full stack trace. -Xiangrui
On Mon, Apr 6, 2015 at 12:06 PM, Jay
.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
jay vyas
Just the same as spark was disrupting the hadoop ecosystem by changing the
assumption that you can't rely on memory in distributed analytics...now
maybe we are challenging the assumption that big data analytics need to
distributed?
I've been asking the same question lately and seen similarly that
(PoolingHttpClientConnectionManager.java:114)
--
jay vyas
-https://wiki.apache.org/incubator/IgniteProposal has I think been updated
recently and has a good comparison.
- Although grid gain has been around since the spark days, Apache Ignite is
quite new and just getting started I think so
- you will probably want to reach out to the developers
!
--
jay vyas
Ah, nevermind, I just saw
http://spark.apache.org/docs/1.2.0/sql-programming-guide.html (language
integrated queries) which looks quite similar to what i was thinking
about. I'll give that a whirl...
On Wed, Feb 11, 2015 at 7:40 PM, jay vyas jayunit100.apa...@gmail.com
wrote:
Hi spark
).by(product,meta=product.id=meta.id). toSchemaRDD ?
I know the above snippet is totally wacky but, you get the idea :)
--
jay vyas
just for
dealing with time stamps.
Whats the simplest and cleanest way to map non-spark time values into
SparkSQL friendly time values?
- One option could be a custom SparkSQL type, i guess?
- Any plan to have native spark sql support for Joda Time or (yikes)
java.util.Calendar ?
--
jay vyas
performing any operations on it. That
way, the EdgeRDDImpl class doesn't have to use the default partitioner.
Hope this helps!
Jay
On Tue Feb 03 2015 at 12:35:14 AM NicolasC nicolas.ch...@inria.fr wrote:
On 01/29/2015 08:31 PM, Ankur Dave wrote:
Thanks for the reminder. I just created a PR
Just curious, is this set to be merged at some point?
On Thu Jan 22 2015 at 4:34:46 PM Ankur Dave ankurd...@gmail.com wrote:
At 2015-01-22 02:06:37 -0800, NicolasC nicolas.ch...@inria.fr wrote:
I try to execute a simple program that runs the ShortestPaths algorithm
Its a very valid idea indeed, but... It's a tricky subject since the entire
ASF is run on mailing lists , hence there are so many different but equally
sound ways of looking at this idea, which conflict with one another.
On Jan 21, 2015, at 7:03 AM, btiernay btier...@hotmail.com wrote:
I
I find importing a working SBT project into IntelliJ is the way to go.
How did you load the project into intellij?
On Jan 13, 2015, at 4:45 PM, Enno Shioji eshi...@gmail.com wrote:
Had the same issue. I can't remember what the issue was but this works:
libraryDependencies ++= {
it just process them?
Asim
--
jay vyas
.
Is this something I just have to live with when using the REPL? Or is this
covered by something bigger that's being addressed?
Thanks in advance
-Jay
Found a problem in the spark-shell, but can't confirm that it's related to
open issues on Spark's JIRA page. I was wondering if anyone could help
identify if this is an issue or if it's already being addressed.
Test: (in spark-shell)
case class Person(name: String, age: Int)
val peopleList =
https://github.com/jayunit100/SparkStreamingCassandraDemo
On this note, I've built a framework which is mostly pure so that functional
unit tests can be run composing mock data for Twitter statuses, with just
regular junit... That might be relevant also.
I think at some point we should come
Here's an example of a Cassandra etl that you can follow which should exit on
its own. I'm using it as a blueprint for revolving spark streaming apps on top
of.
For me, I kill the streaming app w system.exit after a sufficient amount of
data is collected.
That seems to work for most any
I'm trying to implement a graph algorithm that does a form of path
searching. Once a certain criteria is met on any path in the graph, I
wanted to halt the rest of the iterations. But I can't see how to do that
with the Pregel API, since any vertex isn't able to know the state of other
arbitrary
I'm trying to implement a graph algorithm that does a form of path
searching. Once a certain criteria is met on any path in the graph, I
wanted to halt the rest of the iterations. But I can't see how to do that
with the Pregel API, since any vertex isn't able to know the state of other
arbitrary
is a must. (And soon we will have that on the metrics report
as well!! :-) )
-kr, Gerard.
On Thu, Nov 27, 2014 at 8:14 AM, Bill Jay bill.jaypeter...@gmail.com
wrote:
Hi TD,
I am using Spark Streaming to consume data from Kafka and do some
aggregation and ingest the results into RDS. I
Just add one more point. If Spark streaming knows when the RDD will not be
used any more, I believe Spark will not try to retrieve data it will not
use any more. However, in practice, I often encounter the error of cannot
compute split. Based on my understanding, this is because Spark cleared
out
this, is by
asking Spark Streaming remember stuff for longer, by using
streamingContext.remember(duration). This will ensure that Spark
Streaming will keep around all the stuff for at least that duration.
Hope this helps.
TD
On Wed, Nov 26, 2014 at 5:07 PM, Bill Jay bill.jaypeter...@gmail.com
if one can point an example library and how to run it :)
Thanks
--
jay vyas
On Sun, Nov 23, 2014 at 2:13 AM, Bill Jay bill.jaypeter...@gmail.com
wrote:
Hi all,
I am using Spark to consume from Kafka. However, after the job has run
for several hours, I saw the following failure of an executor:
kafka.common.ConsumerRebalanceFailedException:
group-1416624735998_ip-172-31
Hi all,
I am using Spark to consume from Kafka. However, after the job has run for
several hours, I saw the following failure of an executor:
kafka.common.ConsumerRebalanceFailedException:
group-1416624735998_ip-172-31-5-242.ec2.internal-1416648124230-547d2c31
can't rebalance after 4 retries
This seems pretty standard: your IntelliJ classpath isn't matched to the
correct ones that are used in spark shell
Are you using the SBT plugin? If not how are you putting deps into IntelliJ?
On Nov 20, 2014, at 7:35 PM, Sanjay Subramanian
sanjaysubraman...@yahoo.com.INVALID wrote:
there’s the a parameter in KafkaUtils.createStream you can specify
the spark parallelism, also what is the exception stacks.
Thanks
Jerry
*From:* Bill Jay [mailto:bill.jaypeter...@gmail.com]
*Sent:* Tuesday, November 18, 2014 2:47 AM
*To:* Helena Edelson
*Cc:* Jay Vyas; u
Hi all,
I am running a Spark Streaming job. It was able to produce the correct
results up to some time. Later on, the job was still running but producing
no result. I checked the Spark streaming UI and found that 4 tasks of a
stage failed.
The error messages showed that Job aborted due to stage
On Thu, Nov 13, 2014 at 5:00 AM, Helena Edelson helena.edel...@datastax.com
wrote:
I encounter no issues with streaming from kafka to spark in 1.1.0. Do you
perhaps have a version conflict?
Helena
On Nov 13, 2014 12:55 AM, Jay Vyas jayunit100.apa...@gmail.com wrote:
Yup , very important
that only after all the batch data was in?
Thanks
--
jay vyas
...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
jay vyas
Hi all,
I have a Spark streaming job which constantly receives messages from Kafka.
I was using Spark 1.0.2 and the job has been running for a month. However,
when I am currently using Spark 1.1.0. the Spark streaming job cannot
receive any messages from Kafka. I have not made any change to the
Hi all,
I have a Spark streaming job which constantly receives messages from Kafka.
I was using Spark 1.0.2 and the job has been running for a month. However,
when I am currently using Spark 1.1.0. the Spark streaming job cannot
receive any messages from Kafka. I have not made any change to the
in Spark Streaming
example, you can try that. I’ve tested with latest master, it’s OK.
Thanks
Jerry
*From:* Tobias Pfeiffer [mailto:t...@preferred.jp]
*Sent:* Thursday, November 13, 2014 8:45 AM
*To:* Bill Jay
*Cc:* u...@spark.incubator.apache.org
*Subject:* Re: Spark streaming cannot
be local[n], n 1 for
local mode. Beside there’s a Kafka wordcount example in Spark Streaming
example, you can try that. I’ve tested with latest master, it’s OK.
Thanks
Jerry
From: Tobias Pfeiffer [mailto:t...@preferred.jp]
Sent: Thursday, November 13, 2014 8:45 AM
To: Bill Jay
Cc: u
A use case would be helpful?
Batches of RDDs from Streams are going to have temporal ordering in terms of
when they are processed in a typical application ... , but maybe you could
shuffle the way batch iterations work
On Nov 3, 2014, at 11:59 AM, Josh J joshjd...@gmail.com wrote:
When
Hi all,
I have a spark streaming job that consumes data from Kafka and produces
some simple operations on the data. This job is run in an EMR cluster with
10 nodes. The batch size I use is 1 minute and it takes around 10 seconds
to generate the results that are inserted to a MySQL database.
of filtering the data collection
times
the #buckets?
thanks, Gerard.
--
jay vyas
-mail: user-h...@spark.apache.org
--
jay vyas
Hi Spark ! I found out why my RDD's werent coming through in my spark
stream.
It turns out you need the onStart() needs to return , it seems - i.e. you
need to launch the worker part of your
start process in a thread. For example
def onStartMock():Unit ={
val future = new Thread(new
, Oct 21, 2014 at 11:02 AM, jay vyas jayunit100.apa...@gmail.com
wrote:
Hi Spark ! I found out why my RDD's werent coming through in my spark
stream.
It turns out you need the onStart() needs to return , it seems - i.e. you
need to launch the worker part of your
start process in a thread
be
very
useful.
--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com
--
jay vyas
Hi spark !
I dont quite yet understand the semantics of RDDs in a streaming context
very well yet.
Are there any examples of how to implement CustomInputDStreams, with
corresponding Receivers in the docs ?
Ive hacked together a custom stream, which is being opened and is
consuming data
=pyStreamingSparkRDDPipe”)
data = [1, 2, 3, 4, 5]
rdd = sc.parallelize(data)
def echo(data):
print python recieved: %s % (data) # output winds up in the shell
console in my cluster (ie. The machine I launched pyspark from)
rdd.foreach(echo)
print we are done
--
jay vyas
AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: daniel.siegm...@velos.io W: www.velos.io
--
jay vyas
suggested:
val predictions = model.predict(data.map(x = (x.user, x.product)))
Best,
Xiangrui
On Thu, Aug 7, 2014 at 1:20 PM, Burak Yavuz bya...@stanford.edu wrote:
Hi Jay,
I've had the same problem you've been having in Question 1 with a
synthetic dataset. I thought I wasn't producing
for the long ramble.
Jay
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Testing-JUnit-with-Spark-tp10861.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
jay vyas
, 2014 at 10:05 PM, Tathagata Das tathagata.das1...@gmail.com
wrote:
Can you give an idea of the streaming program? Rest of the transformation
you are doing on the input streams?
On Tue, Jul 22, 2014 at 11:05 AM, Bill Jay bill.jaypeter...@gmail.com
wrote:
Hi all,
I am currently running
ak...@sigmoidanalytics.com
wrote:
Can you paste the piece of code?
Thanks
Best Regards
On Wed, Jul 23, 2014 at 1:22 AM, Bill Jay bill.jaypeter...@gmail.com
wrote:
Hi all,
I am running a spark streaming job. The job hangs on one stage, which
shows as follows:
Details for Stage 4
Hi all,
I have a question regarding Spark streaming. When we use the
saveAsTextFiles function and my batch is 60 seconds, Spark will generate a
series of files such as:
result-140614896, result-140614802, result-140614808, etc.
I think this is the timestamp for the beginning of each
:39 AM, Bill Jay bill.jaypeter...@gmail.com
wrote:
Hi all,
I have a question regarding Spark streaming. When we use the
saveAsTextFiles function and my batch is 60 seconds, Spark will generate a
series of files such as:
result-140614896, result-140614802, result-140614808, etc
1 - 100 of 143 matches
Mail list logo