Hi All,
Is there anyway to convert a mllib matrix to a Dense Matrix of Breeze?
Any leads are appreciated.
Thanks,
Naveen
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
Liang yblia...@gmail.com
mailto:yblia...@gmail.com wrote:
You can use Matrix.toBreeze()
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala#L56
.
2015-08-20 18:24 GMT+08:00 Naveen nav...@formcept.com
mailto:nav
Hi,
Is there any function to find the determinant of a mllib.linalg.Matrix
(a covariance matrix) using Spark?
Regards,
Naveen
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user
regParam = 0.01
val regType = L2
val algorithm = new LinearRegressionWithSGD()
algorithm.optimizer.setNumIterations(numIterations).setStepSize(stepSize).setRegParam(regParam)
val model = algorithm.run(parsedTrainData)
Regards,
Naveen
Hi Keith,
Can you try including a clean-up step at the end of job, before driver is
out of SparkContext, to clean the necessary files through some regex
patterns or so, on all nodes in your cluster by default. If files are not
available on few nodes, that should not be a problem, isnnt?
On
gt;
> Anyway, If you run spark applicaction you would have multiple jobs, which
> makes sense that it is not a problem.
>
>
>
> Thanks David.
>
>
>
> *From:* Naveen [mailto:hadoopst...@gmail.com]
> *Sent:* Wednesday, December 21, 2016 9:18 AM
> *To:* d...@spark.apache.o
Hi Team,
Is it ok to spawn multiple spark jobs within a main spark job, my main
spark job's driver which was launched on yarn cluster, will do some
preprocessing and based on it, it needs to launch multilple spark jobs on
yarn cluster. Not sure if this right pattern.
Please share your thoughts.
a context on the application launching the
> jobs?
> You can use SparkLauncher in a normal app and just listen for state
> transitions
>
> On Wed, 21 Dec 2016, 11:44 Naveen, <hadoopst...@gmail.com> wrote:
>
>> Hi Team,
>>
>> Thanks for your responses.
>> Let me gi
sparkcontexts will get different nodes / executors from resource
manager?
On Wed, Dec 21, 2016 at 6:43 PM, Naveen <hadoopst...@gmail.com> wrote:
> Hi Sebastian,
>
> Yes, for fetching the details from Hive and HBase, I would want to use
> Spark's HiveContext etc.
> However, based on you
Hi,
Please use SparkLauncher API class and invoke the threads using async calls
using Futures.
Using SparkLauncher, you can mention class name, application resouce,
arguments to be passed to the driver, deploy-mode etc.
I would suggest to use scala's Future, is scala code is possible.
Thanks Liang, Vadim and everyone for your inputs!!
With this clarity, I've tried client modes for both main and sub-spark
jobs. Every main spark job and its corresponding threaded spark jobs are
coming up on the YARN applications list and the jobs are getting executed
properly. I need to now test
Hi All,
I am trying to run a sample Spark program using Scala SBT,
Below is the program,
def main(args: Array[String]) {
val logFile = E:/ApacheSpark/usb/usb/spark/bin/README.md // Should
be some file on your system
val sc = new SparkContext(local, Simple App,
.
Lines with a: 24, Lines with b: 15
The exception seems to be happening with Spark cleanup after executing
your code. Try adding sc.stop() at the end of your program to see if the
exception goes away.
On Wednesday, December 31, 2014 6:40 AM, Naveen Madhire
vmadh...@umail.iu.edu wrote
Cloudera blog has some details.
Please check if this is helpful to you.
http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
Thanks.
On Wed, May 20, 2015 at 4:21 AM, donhoff_h 165612...@qq.com wrote:
Hi, all
I wrote a program to get HBaseConfiguration object in Spark.
Hi Marcelo, Quick Question.
I am using Spark 1.3 and using Yarn Client mode. It is working well,
provided I have to manually pip-install all the 3rd party libraries like
numpy etc to the executor nodes.
So the SPARK-5479 fix in 1.5 which you mentioned fix this as well?
Thanks.
On Thu, Jun
Hi All,
I am working with dataframes and have been struggling with this thing, any
pointers would be helpful.
I've a Json file with the schema like this,
links: array (nullable = true)
||-- element: struct (containsNull = true)
|||-- desc: string (nullable = true)
|||--
Hi All,
I am running the WikiPedia parsing example present in the Advance
Analytics with Spark book.
https://github.com/sryza/aas/blob/d3f62ef3ed43a59140f4ae8afbe2ef81fc643ef2/ch06-lsa/src/main/scala/com/cloudera/datascience/lsa/ParseWikipedia.scala#l112
The partitions of the RDD returned by
Hi,
I am running pyspark in windows and I am seeing an error while adding
pyfiles to the sparkcontext. below is the example,
sc = SparkContext(local,Sample,pyFiles=C:/sample/yattag.zip)
this fails with no file found error for C
The below logic is treating the path as individual files like C,
Hi,
I am new to spark and need some guidance on below mentioned points:
1)I am using spark 1.2,is it possible to see how much memory is being
allocated to an executor for web UI. If not how can we figure that out.2) I
am interested in source code of mlib,it is possible to get access to
I am facing the same issue, i tried this but getting compilation error for
the $ in the explode function
So, I had to modify to the below to make it work.
df.select(explode(new Column(entities.user_mentions)).as(mention))
On Wed, Jun 24, 2015 at 2:48 PM, Michael Armbrust
use spark-testing-base from
spark-packages.org as a basis for your unittests.
On Fri, Jul 10, 2015 at 12:03 PM, Daniel Siegmann
daniel.siegm...@teamaol.com wrote:
On Fri, Jul 10, 2015 at 1:41 PM, Naveen Madhire vmadh...@umail.iu.edu
wrote:
I want to write junit test cases in scala
I had the similar issue with spark 1.3
After migrating to Spark 1.4 and using sqlcontext.read.json it worked well
I think you can look at dataframe select and explode options to read the
nested json elements, array etc.
Thanks.
On Mon, Jul 20, 2015 at 11:07 AM, Davies Liu dav...@databricks.com
I am using the below code and using kryo serializer .when i run this code i
got this error : Task not serializable in commented line2) how broadcast
variables are treated in exceotu.are they local variables or can be used in any
function defined as global variables.
object
Yes. I did this recently. You need to copy the cloudera cluster related
conf files into the local machine
and set HADOOP_CONF_DIR or YARN_CONF_DIR.
And also local machine should be able to ssh to the cloudera cluster.
On Wed, Jul 15, 2015 at 8:51 AM, ayan guha guha.a...@gmail.com wrote:
Hi,
I want to write junit test cases in scala for testing spark application. Is
there any guide or link which I can refer.
Thank you very much.
-Naveen
)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
... 3 more
Regards,
Naveen
This email is confidential and intended only for the use of the individual or
entity named above and may contain information
st to use Flume, if possible, as it has in built HDFS log
> rolling capabilities
>
> On Mon, Jun 26, 2017 at 1:09 PM, Naveen Madhire <vmadh...@umail.iu.edu>
> wrote:
>
>> Hi,
>>
>> I am using spark streaming with 1 minute duration to read data from kafka
>&
and create a HDFS directory say *every 30
minutes* instead of duration of the spark streaming application?
Any help would be appreciated.
Thanks,
Naveen
k gets mapped to a separate python process? The
reason I ask is I want to be to use mapPartition method to load a batch of
files and run inference on them separately for which I need to load the
object once per task. Any
Thanks for your time in answering my question.
Cheers, Naveen
)
rdd = rdd.map(lambda x: Model.predict(x, args) //*fails here with:
pickle.PicklingError: Could not serialize object: TypeError: can't
pickle thread.lock objects*
Thanks, Naveen
Hi,
I am trying to fetch data from Oracle DB using a subquery and experiencing lot
of performance issues.
Below is the query I am using,
Using Spark 2.0.2
val df = spark_session.read.format("jdbc")
.option("driver","oracle.jdbc.OracleDriver")
.option("url", jdbc_url)
.option("user", user)
Hi,
I am trying to fetch data from Oracle DB using a subquery and experiencing
lot of performance issues.
Below is the query I am using,
*Using Spark 2.0.2*
*val *df = spark_session.read.format(*"jdbc"*)
.option(*"driver"*,*"*oracle.jdbc.OracleDriver*"*)
.option(*"url"*, jdbc_url)
Hi,
I am trying to fetch data from Oracle DB using a subquery and experiencing lot
of performance issues.
Below is the query I am using,
Using Spark 2.0.2
val df = spark_session.read.format("jdbc")
.option("driver","oracle.jdbc.OracleDriver")
.option("url", jdbc_url)
.option("user", user)
on my local machine. I am not able to
find a way to debug.
Please let me know the ways to debug my driver program as well as executor
programs
Regards,
Naveen.
an executor is using to process my jobs?
4) Do we have any chance to control the batch division on nodes?
Please give some clarity on above.
Thanks Regards,
Naveen
into 2 batches of 500 size.
Regards,
Naveen.
.
How to check how many cores are running to complete task of 8 datasets?(Is
there any commands or UI to check that)
Regards,
Naveen.
From: holden.ka...@gmail.com [mailto:holden.ka...@gmail.com] On Behalf Of
Holden Karau
Sent: Friday, November 07, 2014 12:46 PM
To: Naveen Kumar Pokala
Cc: user
Hi,
I am spark 1.1.0. I need a help regarding saving rdd in a JSON file?
How to do that? And how to mentions hdfs path in the program.
-Naveen
(JavaSQLContext.scala:90)
at sample.spark.test.SparkJob.main(SparkJob.java:33)
... 5 more
Please help me.
Regards,
Naveen.
)
case class Instrument(issue: Issue = null)
-Naveen
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Wednesday, November 12, 2014 12:09 AM
To: Xiangrui Meng
Cc: Naveen Kumar Pokala; user@spark.apache.org
Subject: Re: scala.MatchError
Xiangrui is correct that is must be a java bean
[cid:image001.png@01CFFE9C.25904980]
Hi,
How to set the above properties on JavaSQLContext. I am not able to see
setConf method on JavaSQLContext Object.
I have added spark core jar and spark assembly jar to my build path. And I am
using spark 1.1.0 and hadoop 2.4.0
--Naveen
Thanks Akhil.
-Naveen
From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Wednesday, November 12, 2014 6:38 PM
To: Naveen Kumar Pokala
Cc: user@spark.apache.org
Subject: Re: Spark SQL configurations
JavaSQLContext.sqlContext.setConf is available.
Thanks
Best Regards
On Wed, Nov 12, 2014
)
java.lang.Thread.run(Thread.java:745)
Please help me.
Regards,
Naveen.
)
at java.lang.System.load(System.java:1083)
at org.xerial.snappy.SnappyNativeLoader.load(SnappyNativeLoader.java:39)
... 29 more
-Naveen.
.
[cid:image001.png@01D0027F.FB321550]
How to read that file, I mean each line as Object of student.
-Naveen
)
java.lang.Thread.run(Thread.java:745)
How to handle this?
-Naveen
Thanks Akhil.
-Naveen.
From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Tuesday, November 18, 2014 1:19 PM
To: Naveen Kumar Pokala
Cc: user@spark.apache.org
Subject: Re: Null pointer exception with larger datasets
Make sure your list is not null, if that is null then its more like
Hi,
I want to submit my spark program from my machine on a YARN Cluster in yarn
client mode.
How to specify al l the required details through SPARK submitter.
Please provide me some details.
-Naveen.
Hi Akhil,
But driver and yarn both are in different networks, How to specify (export
HADOOP_CONF_DIR=XXX) path.
Like driver is from my windows machine and yarn is some unix machine on
different network.
-Naveen.
From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Monday, November 24
Hi,
While submitting your spark job mention --executor-cores 2 --num-executors 24
it will divide the dataset into 24*2 parquet files.
Or set spark.default.parallelism value like 50 on sparkconf object. It will
divide the dataset into 50 files into your HDFS.
-Naveen
-Original Message
Hi.
Is there a way to submit spark job on Hadoop-YARN cluster from java code.
-Naveen
)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Please can anyone suggest me how to resolve the issue.
-Naveen
Error from python worker:
python: module pyspark.daemon not found
PYTHONPATH was:
/home/npokala/data/spark-install/spark-master/python:
Please can somebody help me on this, how to resolve the issue.
-Naveen
)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Please help me to resolve this issue.
-Naveen
/user/HeartbeatReceiver
15/01/29 17:21:28 INFO SparkILoop: Created spark context..
Spark context available as sc.
-Naveen
Hi,
Anybody tried to connect to spark cluster( on UNIX machines) from windows
interactive shell ?
-Naveen.
help me on this?
Thanks,
Naveen
pache.spark.sql.DataFrame (exprs:
scala.collection.immutable.Map[String,String])org.apache.spark.sql.DataFrame
(aggExpr: (String, String),aggExprs: (String,
String)*)org.apache.spark.sql.DataFrame cannot be applied to
(org.apache.spark.sql.Column)
Naveen
read.java:745)
Thanks,
Naveen Kumar Pokala
[cid:image001.jpg@01D19B26.32EE0FE0]
59 matches
Mail list logo