Hi folks,
1) First Problem:
I'm querying MySQL. I submit a query like this:
out = wam.select('message_id', 'business_id', 'info',
'entered_system_date', 'auto_update_time').filter("auto_update_time >=
'2020-04-01 05:27'").dropDuplicates(['message_id', 'auto_update_time'])
But what I see in the
Hi Jane
Try this example
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala
Som
On Tue, 31 Mar 2020, 21:34 jane thorpe, wrote:
> hi,
>
> Are there setup instructions on the website for
>
hi,
Are there setup instructions on the website for
spark-3.0.0-preview2-bin-hadoop2.7I can run same program for hdfs format
val textFile = sc.textFile("hdfs://...")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ +
Sorry misrepresentation the question also. Thanks for your great help.
What I want is the time zone information as it is 2020-04-11T20:40:00-05:00
in timestamp datatype. so I can write to downstream application as it is. I
can correct the lacking UTC offset info.
On Tue, Mar 31, 2020 at 1:15 PM
And to answer your question (sorry, read too fast). The string is not in
proper ISO8601. Extended form must be used throughout, ie
2020-04-11T20:40:00-05:00, there's a colon (:) lacking in the UTC offset
info.
br,
Magnus
On Tue, Mar 31, 2020 at 7:11 PM Magnus Nilsson wrote:
> Timestamps
Timestamps aren't timezoned. If you parse ISO8601 strings they will be
converted to UTC automatically.
If you parse timestamps without timezone they will converted to the the
timezone the server Spark is running on uses. You can change the timezone
Spark uses with
Hi Spark Users,
I am losing the timezone value from below format, I tried couple of formats
but not able to make it. Can someone throw lights?
scala> val sampleDF = Seq("2020-04-11T20:40:00-0500").toDF("value")
sampleDF: org.apache.spark.sql.DataFrame = [value: string]
scala>
I'm not a software engineer by training and I hope that there's an existing
best practice for the problem I'm trying to solve. I'm using Spark 2.4.5,
Hadoop 2.7, Hive 1.2.
I have a large table (terabytes) from an external source (which is beyond
my control) where the data is stored in a key-value
That seems to come from the difference how Spark infers schema and create
serializer / deserializer for Java beans to construct bean encoder.
When inferring schema for Java beans, all properties which have getter
methods are considered. When creating serializer / deserializer, only
properties
Never mind. It got resolved after I removed extra two getter methods (to
calculate duration) I created in my State specific Java bean
(ProductSessionInformation). But I am surprised why it has created so much
problem. I guess when this bean is converted to Scala class it may not be
taking care of
10 matches
Mail list logo