The correct link is
https://docs.databricks.com/spark/latest/spark-sql/index.html .
This link does have the core syntax such as the BNF for the DDL and DML and
SELECT. It does *not *have a reference for date / string / numeric
functions: is there any such reference at this point? It is not
you could exit with error code just like normal java/scala application, and
get it from driver/yarn
On Fri, Aug 11, 2017 at 9:55 AM, Wei Zhang
wrote:
> I suppose you can find the job status from Yarn UI application view.
>
>
>
> Cheers,
>
> -z
>
>
>
> *From:* 陈宇航
I suppose you can find the job status from Yarn UI application view.
Cheers,
-z
From: 陈宇航 [mailto:yuhang.c...@foxmail.com]
Sent: Thursday, August 10, 2017 5:23 PM
To: user
Subject: How can I tell if a Spark job is successful or not?
I want to do some clean-ups after a
Hi,
I am facing issues while trying to recover a textFileStream from checkpoint.
Basically it is trying to load the files from the begining of the job start
whereas I am deleting the files after processing them. I have the following
configs set so was thinking that it should not look for files
I refer to docs.databricks.com/Spark/latest/Spark-sql/index.html.
Cheers
Jules
Sent from my iPhone
Pardon the dumb thumb typos :)
> On Aug 10, 2017, at 1:46 PM, Stephen Boesch wrote:
>
>
> While the DataFrame/DataSets are useful in many circumstances they are
>
Yes, it's more used in Hive than Spark
Sent from my iPhone
Pardon the dumb thumb typos :)
> On Aug 10, 2017, at 2:24 PM, Sathish Kumaran Vairavelu
> wrote:
>
> I think it is for hive dependency.
>> On Thu, Aug 10, 2017 at 4:14 PM kant kodali
I think it is for hive dependency.
On Thu, Aug 10, 2017 at 4:14 PM kant kodali wrote:
> Since I see a calcite dependency in Spark I wonder where Calcite is being
> used?
>
> On Thu, Aug 10, 2017 at 1:30 PM, Sathish Kumaran Vairavelu <
> vsathishkuma...@gmail.com> wrote:
>
>>
Since I see a calcite dependency in Spark I wonder where Calcite is being
used?
On Thu, Aug 10, 2017 at 1:30 PM, Sathish Kumaran Vairavelu <
vsathishkuma...@gmail.com> wrote:
> Spark SQL doesn't use Calcite
>
> On Thu, Aug 10, 2017 at 3:14 PM kant kodali wrote:
>
>> Hi All,
While the DataFrame/DataSets are useful in many circumstances they are
cumbersome for many types of complex sql queries.
Is there an up to date *SQL* reference - i.e. not DataFrame DSL operations
- for version 2.2?
An example of what is not clear: what constructs are supported within
Spark SQL doesn't use Calcite
On Thu, Aug 10, 2017 at 3:14 PM kant kodali wrote:
> Hi All,
>
> Does Spark SQL uses Calcite? If so, what for? I thought the Spark SQL has
> catalyst which would generate its own logical plans, physical plans and
> other optimizations.
>
>
Hi All,
Does Spark SQL uses Calcite? If so, what for? I thought the Spark SQL has
catalyst which would generate its own logical plans, physical plans and
other optimizations.
Thanks,
Kant
Got the answer from
https://groups.google.com/a/lists.datastax.com/forum/#!topic/spark-connector-user/ETCZdCcaKq8
On Thu, Aug 10, 2017 at 11:59 AM, shyla deshpande
wrote:
> I have a 3 node cassandra cluster. I want to pass all the 3 nodes in spark
> submit. How do I
I have a 3 node cassandra cluster. I want to pass all the 3 nodes in spark
submit. How do I do that.
Any code samples will help.
Thanks
Thanks Cody.
On Wed, Aug 9, 2017 at 8:46 AM, Cody Koeninger wrote:
> org.apache.spark.streaming.kafka.KafkaCluster has methods
> getLatestLeaderOffsets and getEarliestLeaderOffsets
>
> On Mon, Aug 7, 2017 at 11:37 PM, shyla deshpande
> wrote:
> >
Hi,
I have a spark streaming task that basically does the following,
1. Read a batch using a custom receiver
2. Parse and apply transforms to the batch
3. Convert the raw fields to a bunch of features
4. Use a pre-built model to predict the class of each record in
I want to do some clean-ups after a Spark job is finished, and the action I
would do depends on whether the job is successful or not.
So how where can I get the result for the job?
I already tried the SparkListener, it worked fine when the job is successful,
but if the job fails, the listener
Yeah, installing HDFS in our environment is unfornutately going to take lot of
time (approvals/planning etc). I will have to live with local FS for now.
The other option I had already tried is collect() and send everything to driver
node. But my data volume is too huge for driver node to handle
Also, why are you trying to write results locally if you're not using a
distributed file system ? Spark is geared towards writing to a distributed file
system. I would suggest trying to collect() so the data is sent to the master
and then do a write if the result set isn't too big, or
Yes, I have tried with file:/// and the fullpath, as well as just the full path
without file:/// prefix.
Spark session has been closed, no luck though ☹
Regards,
Hemanth
From: Femi Anthony
Date: Thursday, 10 August 2017 at 11.06
To: Hemanth Gudela
Is your filePath prefaced with file:/// and the full path or is it relative ?
You might also try calling close() on the Spark context or session the end of
the program execution to try and ensure that cleanup is completed
Sent from my iPhone
> On Aug 10, 2017, at 3:58 AM, Hemanth Gudela
Thanks for reply Femi!
I’m writing the file like this -->
myDataFrame.write.mode("overwrite").csv("myFilePath")
There absolutely are no errors/warnings after the write.
_SUCCESS file is created on master node, but the problem of _temporary is
noticed only on worked nodes.
I know
Normally the* _temporary* directory gets deleted as part of the cleanup
when the write is complete and a SUCCESS file is created. I suspect that
the writes are not properly completed. How are you specifying the write ?
Any error messages in the logs ?
On Thu, Aug 10, 2017 at 3:17 AM, Hemanth
Hi,
I’m running spark on cluster mode containing 4 nodes, and trying to write CSV
files to node’s local path (not HDFS).
I’m spark.write.csv to write CSV files.
On master node:
spark.write.csv creates a folder with csv file name and writes many files with
part-r-000n suffix. This is okay for
Hi Jose,
Just to note that in the databricks blog they state that they compute the
top-5 singular vectors, not all singular values/vectors. Computing all is
much more computational intense.
Cheers,
Anastasios
Am 09.08.2017 15:19 schrieb "Jose Francisco Saray Villamizar" <
jsa...@gmail.com>:
24 matches
Mail list logo