Hi,
I have been trying to distribute Kafka topics among different instances of
same consumer group. I am using KafkaDirectStream API for creating DStreams.
After the second consumer group comes up, Kafka does partition rebalance and
then Spark driver of the first consumer dies with the
Hi Guys
Quick one: How spark deals (ie create partitions) with large files sitting
on NFS, assuming the all executors can see the file exactly same way.
ie, when I run
r = sc.textFile("file://my/file")
what happens if the file is on NFS?
is there any difference from
r =
The CSV data source allows you to skip invalid lines - this should also include
lines that have more than maxColumns. Choose mode "DROPMALFORMED"
> On 8. Jun 2017, at 03:04, Chanh Le wrote:
>
> Hi Takeshi, Jörn Franke,
>
> The problem is even I increase the maxColumns it
did you include the proper scala-reflect dependency?
On Wed, May 31, 2017 at 1:01 AM, krishmah wrote:
> I am currently using Spark 2.0.1 with Scala 2.11.8. However same code works
> with Scala 2.10.6. Please advise if I am missing something
>
> import
I think you need to get the logger within the lambda, otherwise it's the
logger on driver side which can't work.
On Wed, May 31, 2017 at 4:48 PM, Paolo Patierno wrote:
> No it's running in standalone mode as Docker image on Kubernetes.
>
>
> The only way I found was to
we use AsyncHttpClient(from the java world) and simply call future.get as
synchronous call.
On Thu, Jun 1, 2017 at 4:08 AM, vimal dinakaran wrote:
> Hi,
> In our application pipeline we need to push the data from spark streaming
> to a http server.
>
> I would like to have
1. could you give job, stage & task status from Spark UI? I found it
extremely useful for performance tuning.
2. use modele.transform for predictions. Usually we have a pipeline for
preparing training data, and use the same pipeline to transform data you
want to predict could give us the
I'd suggest scripts like js, groovy, etc.. To my understanding the service
loader mechanism isn't a good fit for runtime reloading.
On Wed, Jun 7, 2017 at 4:55 PM, Jonnas Li(Contractor) <
zhongshuang...@envisioncn.com> wrote:
> To be more explicit, I used mapwithState() in my application, just
if you use StringIndexer to category the data, IndexToString could convert
it back.
On Wed, Jun 7, 2017 at 6:14 PM, kundan kumar wrote:
> Hi Yan,
>
> This doesnt work.
>
> thanks,
> kundan
>
> On Wed, Jun 7, 2017 at 2:53 PM, 颜发才(Yan Facai)
> wrote:
>
Hi Takeshi, Jörn Franke,
The problem is even I increase the maxColumns it still have some lines have
larger columns than the one I set and it will cost a lot of memory.
So I just wanna skip the line has larger columns than the maxColumns I set.
Regards,
Chanh
On Thu, Jun 8, 2017 at 12:48 AM
A lot depends on your context as well. If I'm using Spark _for analysis_, I
frequently use python; it's a starting point, from which I can then
leverage pandas, matplotlib/seaborn, and other powerful tools available on
top of python.
If the Spark outputs are the ends themselves, rather than the
Mich,
We use Scala for a large project. On our team we've set a few standards to
ensure readability (we try to avoid excessive use of tuples, use named
functions, etc.) Given these constraints, I find Scala to be very
readable, and far easier to use than Java. The Lambda functionality of
Java
Is it not enough to set `maxColumns` in CSV options?
https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L116
// maropu
On Wed, Jun 7, 2017 at 9:45 AM, Jörn Franke wrote:
> Spark CSV data
I think this is a religious question ;-)
Java is often underestimated, because people are not aware of its lambda
functionality which makes the code very readable. Scala - it depends who
programs it. People coming with the normal Java background write Java-like code
in scala which might not be
Spark CSV data source should be able
> On 7. Jun 2017, at 17:50, Chanh Le wrote:
>
> Hi everyone,
> I am using Spark 2.1.1 to read csv files and convert to avro files.
> One problem that I am facing is if one row of csv file has more columns than
> maxColumns (default is
user-unsubscr...@spark.apache.org
From: kundan kumar [mailto:iitr.kun...@gmail.com]
Sent: Wednesday, June 7, 2017 5:15 AM
To: 颜发才(Yan Facai)
Cc: spark users
Subject: Re: Convert the feature vector to raw data
Hi Yan,
This doesnt work.
user-unsubscr...@spark.apache.org
From: 颜发才(Yan Facai) [mailto:facai@gmail.com]
Sent: Wednesday, June 7, 2017 4:24 AM
To: kundan kumar
Cc: spark users
Subject: Re: Convert the feature vector to raw data
Hi, kumar.
How about removing the
user-unsubscr...@spark.apache.org
user-unsubscr...@spark.apache.org
From: kundan kumar [mailto:iitr.kun...@gmail.com]
Sent: Wednesday, June 7, 2017 4:01 AM
To: spark users
Subject: Convert the feature vector to raw data
I am using
Dataset result =
Hi everyone,
I am using Spark 2.1.1 to read csv files and convert to avro files.
One problem that I am facing is if one row of csv file has more columns
than maxColumns (default is 20480). The process of parsing was stop.
Internal state when error was thrown: line=1, column=3, record=0,
I changed the datastructure to scala.collection.immutable.Set and I still
see the same issue. My key is a String. I do the following in my reduce
and invReduce.
visitorSet1 ++visitorSet2.toTraversable
visitorSet1 --visitorSet2.toTraversable
On Tue, Jun 6, 2017 at 8:22 PM, Tathagata Das
Hi,
I am a fan of Scala and functional programming hence I prefer Scala.
I had a discussion with a hardcore Java programmer and a data scientist who
prefers Python.
Their view is that in a collaborative work using Scala programming it is
almost impossible to understand someone else's Scala
Thanks Doc I saw this on another board yesterday so I've tried this by
first going to the directory where I've stored the wintutils.exe and then
as an admin running the command that you suggested and I get this
exception when checking the permissions:
C:\winutils\bin>winutils.exe ls -F
Hi Curtis,
I believe in windows, the following command needs to be executed: (will
need winutils installed)
D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive
On 6 June 2017 at 09:45, Curtis Burkhalter
wrote:
> Hello all,
>
> I'm new to Spark and I'm trying to
No, I don't.
ср, 7 июн. 2017 г. в 16:42, Jean Georges Perrin :
> Do you have some other security in place like Kerberos or impersonation?
> It may affect your access.
>
>
> jg
>
>
> On Jun 7, 2017, at 02:15, Patrik Medvedev
> wrote:
>
> Hello guys,
>
> I
Do you have some other security in place like Kerberos or impersonation? It may
affect your access.
jg
> On Jun 7, 2017, at 02:15, Patrik Medvedev wrote:
>
> Hello guys,
>
> I need to execute hive queries on remote hive server from spark, but for some
> reasons
Hi Yan,
This doesnt work.
thanks,
kundan
On Wed, Jun 7, 2017 at 2:53 PM, 颜发才(Yan Facai) wrote:
> Hi, kumar.
>
> How about removing the `select` in your code?
> namely,
>
> Dataset result = model.transform(testData);
> result.show(1000, false);
>
>
>
>
> On Wed, Jun 7,
Hi, kumar.
How about removing the `select` in your code?
namely,
Dataset result = model.transform(testData);
result.show(1000, false);
On Wed, Jun 7, 2017 at 5:00 PM, kundan kumar wrote:
> I am using
>
> Dataset result =
Hello guys,
I need to execute hive queries on remote hive server from spark, but for
some reasons i receive only column names(without data).
Data available in table, i checked it via HUE and java jdbc connection.
Here is my code example:
val test = spark.read
.option("url",
Hello guys,
I need to execute hive queries on remote hive server from spark, but for
some reasons i receive only column names(without data).
Data available in table, i checked it via HUE and java jdbc connection.
Here is my code example:
val test = spark.read
.option("url",
I am using
Dataset result = model.transform(testData).select("probability",
"label","features");
result.show(1000, false);
In this case the feature vector is being printed as output. Is there a way
that my original raw data gets printed instead of the feature vector OR is
there a way to reverse
To be more explicit, I used mapwithState() in my application, just like this:
stream = KafkaUtils.createStream(..)
mappedStream = stream.mapPartitionToPair(..)
stateStream = mappedStream.mapwithState(MyUpdateFunc(..))
stateStream.foreachRDD(..)
I call the jar in MyUpdateFunc(), and the jar
Agreed with Ayan.
Essentially an Edge node is a physical host or VM that is used by the
application to run the job. The users or service users start the process
from the Edge node. Edge nodes are added to the cluster for example
DEV/TEST/UAT etc.
Edge node normally has all compatible binaries in
32 matches
Mail list logo