Re: Is Structured streaming ready for production usage

2017-06-08 Thread Shixiong(Ryan) Zhu
Please take a look at http://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html On Thu, Jun 8, 2017 at 4:46 PM, swetha kasireddy wrote: > OK. Can we use Spark Kafka Direct with Structured Streaming? > > On Thu, Jun 8, 2017 at 4:46 PM, swetha

Re: Is Structured streaming ready for production usage

2017-06-08 Thread swetha kasireddy
OK. Can we use Spark Kafka Direct with Structured Streaming? On Thu, Jun 8, 2017 at 4:46 PM, swetha kasireddy wrote: > OK. Can we use Spark Kafka Direct as part of Structured Streaming? > > On Thu, Jun 8, 2017 at 3:35 PM, Tathagata Das

Re: Is Structured streaming ready for production usage

2017-06-08 Thread swetha kasireddy
OK. Can we use Spark Kafka Direct as part of Structured Streaming? On Thu, Jun 8, 2017 at 3:35 PM, Tathagata Das wrote: > YES. At Databricks, our customers have already been using Structured > Streaming and in the last month alone processed over 3 trillion records.

Re: Is Structured streaming ready for production usage

2017-06-08 Thread Tathagata Das
YES. At Databricks, our customers have already been using Structured Streaming and in the last month alone processed over 3 trillion records. https://databricks.com/blog/2017/06/06/simple-super-fast-streaming-engine-apache-spark.html On Thu, Jun 8, 2017 at 3:03 PM, SRK

Is Structured streaming ready for production usage

2017-06-08 Thread SRK
Hi, Is structured streaming ready for production usage in Spark 2.2? Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-Structured-streaming-ready-for-production-usage-tp28751.html Sent from the Apache Spark User List mailing list archive

Re: problem initiating spark context with pyspark

2017-06-08 Thread Marco Mistroni
try this link http://letstalkspark.blogspot.co.uk/2016/02/getting-started-with-spark-on-window-64.html it helped me when i had similar problems with windows... hth On Wed, Jun 7, 2017 at 3:46 PM, Curtis Burkhalter < curtisburkhal...@gmail.com> wrote: > Thanks Doc I saw this on another

unsubscribe

2017-06-08 Thread Brindha Sengottaiyan

Re: [Spark Core] Does spark support read from remote Hive server via JDBC

2017-06-08 Thread Ranadip Chatterjee
Looks like your session user does not have the required privileges on the remote hdfs directory that is holding the hive data. Since you get the columns, your session is able to read the metadata, so connection to the remote hiveserver2 is successful. You should be able to find more

Re: [Spark Core] Does spark support read from remote Hive server via JDBC

2017-06-08 Thread Richard Moorhead
You might try pointing your spark context at the hive metastore via: val conf = new SparkConf() conf.set("hive.metastore.uris", "your.thrift.server:9083") val sparkSession = SparkSession.builder() .config(conf) .enableHiveSupport() .getOrCreate() . . . . . . . . . . . . . . . . . . .

Re: [Spark Core] Does spark support read from remote Hive server via JDBC

2017-06-08 Thread Даша Ковальчук
The result is count = 0. 2017-06-08 19:42 GMT+03:00 ayan guha : > What is the result of test.count()? > > On Fri, 9 Jun 2017 at 1:41 am, Даша Ковальчук > wrote: > >> Thanks for your reply! >> Yes, I tried this solution and had the same result.

Re: Read Data From NFS

2017-06-08 Thread ayan guha
Any one? On Thu, 8 Jun 2017 at 3:26 pm, ayan guha wrote: > Hi Guys > > Quick one: How spark deals (ie create partitions) with large files sitting > on NFS, assuming the all executors can see the file exactly same way. > > ie, when I run > > r = sc.textFile("file://my/file")

Re: [Spark Core] Does spark support read from remote Hive server via JDBC

2017-06-08 Thread ayan guha
What is the result of test.count()? On Fri, 9 Jun 2017 at 1:41 am, Даша Ковальчук wrote: > Thanks for your reply! > Yes, I tried this solution and had the same result. Maybe you have another > solution or maybe I can execute query in another way on remote cluster? > >

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-08 Thread Takeshi Yamamuro
I filed a jira about this issue: https://issues.apache.org/jira/browse/SPARK-21024 On Thu, Jun 8, 2017 at 1:27 AM, Chanh Le wrote: > Can you recommend one? > > Thanks. > > On Thu, Jun 8, 2017 at 2:47 PM Jörn Franke wrote: > >> You can change the CSV

Re: Question about mllib.recommendation.ALS

2017-06-08 Thread Sahib Aulakh [Search] ­
Many thanks. Will try it. On Thu, Jun 8, 2017 at 8:41 AM Nick Pentreath wrote: > Spark 2.2 will support the recommend-all methods in ML. > > Also, both ML and MLLIB performance has been greatly improved for the > recommend-all methods. > > Perhaps you could check out

Re: [Spark Core] Does spark support read from remote Hive server via JDBC

2017-06-08 Thread Даша Ковальчук
Thanks for your reply! Yes, I tried this solution and had the same result. Maybe you have another solution or maybe I can execute query in another way on remote cluster? 2017-06-08 18:30 GMT+03:00 Даша Ковальчук : > Thanks for your reply! > Yes, I tried this solution

Re: Question about mllib.recommendation.ALS

2017-06-08 Thread Nick Pentreath
Spark 2.2 will support the recommend-all methods in ML. Also, both ML and MLLIB performance has been greatly improved for the recommend-all methods. Perhaps you could check out the current RC of Spark 2.2 or master branch to try it out? N On Thu, 8 Jun 2017 at 17:18, Sahib Aulakh [Search] ­ <

Re: Question about mllib.recommendation.ALS

2017-06-08 Thread Sahib Aulakh [Search] ­
Many thanks for your response. I already figured out the details with some help from another forum. 1. I was trying to predict ratings for all users and all products. This is inefficient and now I am trying to reduce the number of required predictions. 2. There is a nice example

Re: [Spark Core] Does spark support read from remote Hive server via JDBC

2017-06-08 Thread Vadim Semenov
Have you tried running a query? something like: ``` test.select("*").limit(10).show() ``` On Thu, Jun 8, 2017 at 4:16 AM, Даша Ковальчук wrote: > Hi guys, > > I need to execute hive queries on remote hive server from spark, but for > some reasons i receive only

Re: Worker node log not showed

2017-06-08 Thread Eike von Seggern
2017-05-31 10:48 GMT+02:00 Paolo Patierno : > No it's running in standalone mode as Docker image on Kubernetes. > > > The only way I found was to access "stderr" file created under the "work" > directory in the SPARK_HOME but ... is it the right way ? > I think that is the

Re: Scala, Python or Java for Spark programming

2017-06-08 Thread JB Data
Java is Object langage borned to Data, Python is Data langage borned to Objects or else... Eachone has its owns uses. @JBD 2017-06-08 8:44 GMT+02:00 Jörn Franke : > A slight advantage of Java is also the tooling that exist around it - > better

Output of select in non exponential form.

2017-06-08 Thread kundan kumar
predictions.select("prediction", "label", "features").show(5) I have labels as line numbers but they are getting printed in exponential format. Is there a way to print it in normal double notation. Kundan

[Spark STREAMING]: Can not kill job gracefully on spark standalone cluster

2017-06-08 Thread Mariusz D.
There is a problem with killing jobs gracefully in spark 2.1.0 with enabled spark.streaming.stopGracefullyOnShutdown I tested killing spark jobs in many ways and I got some conclusions. 1. With command spark-submit --master spark:// --kill driver-id results: It killed all workers almost

Re: Performance issue when running Spark-1.6.1 in yarn-client mode with Hadoop 2.6.0

2017-06-08 Thread Satish John Bosco
I have tried the configuration calculator sheet provided by Cloudera as well but no improvements. However, ignoring the 17 mil operation to begin with. Let consider the simple sort on yarn and spark which has tremendous difference. The operation is simple Selected numeric col to be sorted

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-08 Thread Chanh Le
Can you recommend one? Thanks. On Thu, Jun 8, 2017 at 2:47 PM Jörn Franke wrote: > You can change the CSV parser library > > On 8. Jun 2017, at 08:35, Chanh Le wrote: > > > I did add mode -> DROPMALFORMED but it still couldn't ignore it because > the

[Spark Core] Does spark support read from remote Hive server via JDBC

2017-06-08 Thread Даша Ковальчук
Hi guys, I need to execute hive queries on remote hive server from spark, but for some reasons i receive only column names(without data). Data available in table, I checked it via HUE and java jdbc connection. Here is my code example: val test = spark.read .option("url",

Re: [Spark JDBC] Does spark support read from remote Hive server via JDBC

2017-06-08 Thread Patrik Medvedev
Hello guys, Can somebody help me with my problem? Let me know, if you need more details. ср, 7 июн. 2017 г. в 16:43, Patrik Medvedev : > No, I don't. > > ср, 7 июн. 2017 г. в 16:42, Jean Georges Perrin : > >> Do you have some other security in place

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-08 Thread Jörn Franke
You can change the CSV parser library > On 8. Jun 2017, at 08:35, Chanh Le wrote: > > > I did add mode -> DROPMALFORMED but it still couldn't ignore it because the > error raise from the CSV library that Spark are using. > > >> On Thu, Jun 8, 2017 at 12:11 PM Jörn

Re: Scala, Python or Java for Spark programming

2017-06-08 Thread Jörn Franke
A slight advantage of Java is also the tooling that exist around it - better support by build tools and plugins, advanced static code analysis (security, bugs, performance) etc. > On 8. Jun 2017, at 08:20, Mich Talebzadeh wrote: > > What I like about Scala is that

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-08 Thread Chanh Le
I did add mode -> DROPMALFORMED but it still couldn't ignore it because the error raise from the CSV library that Spark are using. On Thu, Jun 8, 2017 at 12:11 PM Jörn Franke wrote: > The CSV data source allows you to skip invalid lines - this should also > include lines

Re: Scala, Python or Java for Spark programming

2017-06-08 Thread Mich Talebzadeh
What I like about Scala is that it is less ceremonial compared to Java. Java users claim that Scala is built on Java so the error tracking is very difficult. Also Scala sits on top of Java and that makes it virtually depending on Java. For me the advantage of Scala is its simplicity and