Rmse recomender system

2017-05-20 Thread Arun
hi all.. I am new to machine learning. i am working on recomender system. for training dataset RMSE is  0.08  while on test data its is 2.345 whats conclusion and what steps can i take to improve Sent from Samsung tablet

Re: Documentation on "Automatic file coalescing for native data sources"?

2017-05-20 Thread Kabeer Ahmed
Thank you Takeshi.As far as I see from the code pointed, the default number of bytes to pack in a partition is set to 128MB - size of the parquet block size. Daniel,It seems you do have a need to modify the number of bytes you want to pack per partition. I am curious to know the scenario. Please

couple naive questions on Spark Structured Streaming

2017-05-20 Thread kant kodali
Hi, 1. Can we use Spark Structured Streaming for stateless transformations just like we would do with DStreams or Spark Structured Streaming is only meant for stateful computations? 2. When we use groupBy and Window operations for event time processing and specify a watermark does this mean the

unsubscribe

2017-05-20 Thread williamtellme123
unsubscribe From: Abir Chakraborty [mailto:abi...@247-inc.com] Sent: Saturday, May 20, 2017 1:29 AM To: user@spark.apache.org Subject: unsubscribe

Re: Documentation on "Automatic file coalescing for native data sources"?

2017-05-20 Thread Takeshi Yamamuro
I think this document points to a logic here: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L418 This logic merge small files into a partition and you can control this threshold via `spark.sql.files.maxPartitionBytes`.

Re: SparkSQL not able to read a empty table location

2017-05-20 Thread Steve Loughran
On 20 May 2017, at 01:44, Bajpai, Amit X. -ND > wrote: Hi, I have a hive external table with the S3 location having no files (but the S3 location directory does exists). When I am trying to use Spark SQL to count the number of records in

unsubscribe

2017-05-20 Thread Abir Chakraborty