Thanks T.D! And sorry for the typo. It's very helpful to know that whatever
I was achieving with DStreams I can also achieve the same with Structured
streaming.
It seems like there is some other error in my code which I fixed it and it
seem to be working fine now!
Thanks again!
On Thu, Sep 14,
There is expected to be about 5 million UUIDs in a day. I need to use this
field to drop duplicate records and count number. If I simply count numbers
without using dropDuplicates it only occupies less than 1g memory. I
believe most of the memory is occupied by the state store for keeping the
HI all
could anyone assist pls?
i am trying to flatMap a DataSet[(String, String)] and i am getting errors
in Eclipse
the errors are more Scala related than spark -related, but i was wondering
if someone came across
a similar situation
here's what i got. A DS of (String, String) , out of which i
How many UUIDs do you expect to have in a day? That is likely where all
the memory is being used. Does it work without that?
On Tue, Sep 12, 2017 at 8:42 PM, 张万新 wrote:
> *Yes, my code is shown below(I also post my code in another mail)*
> /**
> * input
> */
>
Can you show the explain() for the version that doesn't work? You might
just be hitting a bug.
On Tue, Sep 12, 2017 at 9:03 PM, 张万新 wrote:
> It seems current_timestamp() cannot be used directly in window function?
> because after attempts I found that using
>
>
try df.select($"col".cast(DoubleType))
import org.apache.spark.sql.types._
val df = spark.sparkContext.parallelize(Seq(("1.04"))).toDF("c")
df.select($"c".cast(DoubleType))
On Thu, Sep 14, 2017 at 9:20 PM, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:
> Hi,
>
> I am getting below
I would probably suggest that you partition by format (though you can get
the file name from the build in function input_file_name()). You can load
multiple streams from different directories and union them together as long
as the schema is the same after parsing. Otherwise you can just run
Hi,
I am getting below error when trying to cast column value from spark
dataframe to double. any issues. I tried many solutions but none of them
worked.
java.lang.ClassCastException: java.lang.String cannot be cast to
java.lang.Double
1. row.getAs[Double](Constants.Datapoint.Latitude)
2.
On 14 September 2017 at 10:42, wrote:
> val noTs = myData.map(dropTimestamp)
>
> val scaled = scaler.transform(noTs)
>
> val projected = (new RowMatrix(scaled)).multiply(principalComponents).rows
>
> val clusters = myModel.predict(projected)
>
> val result =
Usually spark ml Models specify the columns they use for training. i.e. you
would only select your columns (X) for model training but metadata i.e.
target labels or your date column (y) would still be present for each row.
schrieb am Do., 14. Sep. 2017 um 10:42 Uhr:
In several situations I would like to zip RDDs knowing that their order
matches. In particular I’m using an MLLib KMeansModel on an RDD of Vectors so I
would like to do:
myData.zip(myModel.predict(myData))
Also the first column in my RDD is a timestamp which I don’t want to be a part
of the
(Sorry Mehmet, I'm seeing just now your first reply with the link to SO; it had
first gone to my spam folder :-/ )
On 2017-09-14 10:02 CEST, GRANDE Johan Ext DTSI/DSI wrote:
Well if the order cannot be guaranteed in case of a failure (or at all since
failure can happen transparently), what
Well if the order cannot be guaranteed in case of a failure (or at all since
failure can happen transparently), what does it mean to sort an RDD (method
sortBy)?
On 2017-09-14 03:36 CEST mehmet.su...@gmail.com wrote:
I think it is one of the conceptual difference in Spark compare to other
Are you sure the code is correct? A Dataset does not have a method
"trigger". Rather I believe the correct code should be
StreamingQuery query = resultDataSet*.writeStream.*trigger(
ProcesingTime(1000)).format("kafka").start();
You can do all the same things you can do with Structured Streaming
Hi
Using Spark 2.1.1.2.6-1.0-129 (from Hortonworks distro) and Scala 2.11.8
and Java 1.8.0_60
I have Nifi flow produces more records than Spark stream can work in
batch time. To avoid spark queue overflow I wanted to try spark
streaming backpressure (did not work for my) so back to the more
15 matches
Mail list logo