Hi Chandan,
MLlib only support getting p-value, t-value from Linear Regression model,
other models such as Logistic Model are not supported currently. This
feature is under development and will be released at the next version(Spark
2.0).
Thanks
Yanbo
2016-01-18 16:45 GMT+08:00 Chandan Verma
Hi all,
I have calculated a covariance?? it's a Matrix type ,now i want to save
the result to hdfs, how can i do it?
thx
Hi Yash,
Basically, my question is how to avoid storing the kafka offsets in
spark checkpoint directory. Streaming context is getting build from
checkpoint directory and proceeding with the offsets in checkpointed RDD.
I want to consume data from kafka from specific offsets along with the
Hi everybody,
since I am new to Spark, I am familiarizing with it by writing CPU-intensive
applications like kmeans and knn. However, I observe some threads other than
the worker threads using a lot of CPU. In particular, in jvisualvm, I
observe the Acceptor and qtp threads to show such behavior,
(Apologies if this has arrived more than once. I've subscribed to the list,
and tried posting via email with no success. This is an attempt through the
Nabble interface.)
I've been having lots of trouble with DataFrames whose columns have dots in
their names today. I know that in many places,
Hi,
You should be able to point Hive to Tachyon instead of HDFS, and that
should allow Hive to access data in Tachyon. If Spark SQL was pointing to
an HDFS file, you could instead point it to a Tachyon file, and that should
work too.
Hope that helps,
Gene
On Wed, Jan 20, 2016 at 2:06 AM, Sea
Hi all,
I have calculated a covariance?? it's a Matrix type ,now i want to save
the result to hdfs, how can i do it?
thx
Hi,
I have two files
File1
Group by Condition
Field1 Y
Field 2 N
Field3 Y
File2 is data file having field1,field2,field3 etc..
field1 field2 field3 field4 field5
data1 data2 data3 data4 data 5
data11 data22 data33 data44 data 55
Now my requirement is to group
Thanks mohammed and Ted.
I will try out the options and let you all know the progress. Also had posted
in spark Cassandra connector community, got similar response.
Regards
Vivek
On Sat, Jan 23, 2016 at 11:37 am, Mohammed Guller
> wrote:
Hi Andy,
I will take a look at your code after your share it.
Thanks!
Yanbo
2016-01-23 0:18 GMT+08:00 Andy Davidson :
> Hi Yanbo
>
> I recently code up the trivial example from
>
Hi,
Any suggestions on this approach?
Regards,
Rajesh
On Sat, Jan 23, 2016 at 11:24 PM, Madabhattula Rajesh Kumar <
mrajaf...@gmail.com> wrote:
> Hi,
>
> I have a big database table(1 million plus records) in oracle. I need to
> query records based on input numbers. For this use case, I am
Unfortunately still getting error when use .show() with `false` or `False`
or `FALSE`
Py4JError: An error occurred while calling o153.showString. Trace:
py4j.Py4JException: Method showString([class java.lang.String, class
java.lang.Boolean]) does not exist
at
Hi Devesh,
RFormula will encode category variables(column of string type) as dummy
variables automatically. You do not need to do dummy transform explicitly
if you want to train machine learning model using SparkR. Although SparkR
only supports a limited ML algorithms(GLM) currently.
Thanks
Matrix can be save as column of type MatrixUDT.
Hi All,
How long the shuffle files and data files are stored on the block manager
folder of the workers.
I have a spark streaming job with window duration of 2 hours and slide
interval of 15 minutes.
When I execute the following command in my block manager path
find . -type f -cmin +150 -name
Hi Yanbo,
I'm using java language and the environment is spark 1.4.1.
Can u tell me how to do it more detail , the follows is my code, how can
i save the cov to hdfs file ?
"
RowMatrix mat = new RowMatrix(rows.rdd());
Matrix cov = mat.computeCovariance(); "
Hi,
I have applied the following code on airquality dataset available in R ,
which has some missing values. I want to omit the rows which has NAs
library(SparkR) Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages"
"com.databricks:spark-csv_2.10:1.2.0" "sparkr-shell"')
sc <-
The solution I normally use is to zipWithIndex() and then use the filter
operation. Filter is an O(m) operation where m is the size of your
partition, not an O(N) operation.
-Ilya Ganelin
On Sat, Jan 23, 2016 at 5:48 AM, Nirav Patel wrote:
> Problem is I have RDD of
One thing you can also look at is to save your data in a way that can be
accessed through file patterns. Eg by hour, zone etc so that you only load
what you need.
On Jan 24, 2016 10:00 PM, "Ilya Ganelin" wrote:
> The solution I normally use is to zipWithIndex() and then use
I am not getting anywhere with any of the suggestions so far. :(
Trying some more outlets, I will share any solution I find.
- Isaac
On Jan 23, 2016, at 1:48 AM, Renu Yadav
> wrote:
If you turn on spark.speculation on then that might help. it worked
Hi All,
I have a machine with the following configuration:
32 GB RAM
500 GB HDD
8 CPUs
Following are the parameters i'm starting my Spark context with:
val conf = new
SparkConf().setAppName("MasterApp").setMaster("local[1]").set("spark.executor.memory",
"20g")
I'm reading a 4.3 GB file and
bq. I'm reading a 4.3 GB file
The contents of the file can be held in one executor.
Can you try files with much larger size ?
Cheers
On Sun, Jan 24, 2016 at 12:11 PM, jimitkr wrote:
> Hi All,
>
> I have a machine with the following configuration:
> 32 GB RAM
> 500 GB HDD
I've been having lots of trouble with DataFrames whose columns have dots in
their names today. I know that in many places, backticks can be used to
quote column names, but the problem I'm running into now is that I can't
drop a column that has *no* dots in its name when there are *other* columns
23 matches
Mail list logo