You can also mount HDFS through the NFS gateway and access i think.
Thanks
Best Regards
On Tue, Aug 25, 2015 at 3:43 AM, Dino Fancellu d...@felstar.com wrote:
http://hortonworks.com/blog/windows-explorer-experience-hdfs/
Seemed to exist, now now sign.
Anything similar to tie HDFS into
Are you starting your history server?
./sbin/start-history-server.sh
You can read more here
http://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact
Thanks
Best Regards
On Tue, Aug 25, 2015 at 1:07 AM, b.bhavesh b.borisan...@gmail.com wrote:
Hi,
I am working on
See
https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html
FYI
On Sat, Aug 29, 2015 at 1:04 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
You can also mount HDFS through the NFS gateway and access i think.
Thanks
Best Regards
On Tue, Aug 25, 2015 at
I suspect in the last scenario you are having an empty new line at the last
line. If you put a try..catch you'd definitely know.
Thanks
Best Regards
On Tue, Aug 25, 2015 at 2:53 AM, Michael Armbrust mich...@databricks.com
wrote:
This top line here is indicating that the exception is being
If HDFS is on a linux VM, you could also mount it with FUSE and export it
with samba
2015-08-29 2:26 GMT-07:00 Ted Yu yuzhih...@gmail.com:
See
https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html
FYI
On Sat, Aug 29, 2015 at 1:04 AM, Akhil Das
It depends, if HDFS is running under windows, FUSE won't work, but if HDFS
is on a linux VM, Box, or cluster, then you can have the linux box/vm mount
HDFS through FUSE and at the same time export its mount point on samba. At
that point, your windows machine can just connect to the samba share.
R.
I'm using Windows.
Are you saying it works with Windows?
Dino.
On 29 August 2015 at 09:04, Akhil Das ak...@sigmoidanalytics.com wrote:
You can also mount HDFS through the NFS gateway and access i think.
Thanks
Best Regards
On Tue, Aug 25, 2015 at 3:43 AM, Dino Fancellu d...@felstar.com
0
down vote
favorite
I am using Apache Spark for analyzing query log. I already faced some
difficulties to setup spark. Now I am using a standalone cluster to process
queries.
First I used example code in java to count words that worked fine. But when
I try to connect it to a MySQL
So either you empty line at the end or when you use string.split you dont
specify -1 as second parameter...
On Aug 29, 2015 1:18 PM, Akhil Das ak...@sigmoidanalytics.com wrote:
I suspect in the last scenario you are having an empty new line at the
last line. If you put a try..catch you'd
Finally, I found the solution:
on the spark context you can set spark.executorEnv.[EnvironmentVariableName]
and these will be available in the environment of the executors
This is in fact documented, but somehow I missed it.
We are using Cassandra for similar kind of problem and it works well... You
need to take care of race condition between updating the store and looking
up the store...
On Aug 29, 2015 1:31 AM, Ted Yu yuzhih...@gmail.com wrote:
+1 on Jason's suggestion.
bq. this large variable is broadcast many
Yes you need to use dimensionality reduction and/or locality sensitive
hashing to reduce number of pairs to compare. There is also LSH implementation
for collection of vectors I have just published here:
https://github.com/marufaytekin/lsh-spark. Implementation i based on this
paper:
Yes, you can set the SPARK_LOCAL_DIR in the spark-env.sh or spark.local.dir
in the spark-defaults.conf file, then it would use this location for the
shuffle writes etc.
Thanks
Best Regards
On Wed, Aug 26, 2015 at 6:56 PM, michael.engl...@nomura.com wrote:
Hi,
I am having issues with /tmp
You can do a rdd.saveAsObjectFile for storing and for reading you can do a
sc.objectFile
Thanks
Best Regards
On Thu, Aug 27, 2015 at 9:29 PM, saif.a.ell...@wellsfargo.com wrote:
Hi,
Any way to store/load RDDs keeping their original object instead of string?
I am having trouble with parquet
Thank you, I saw this before, but it is just a binary classification, so
how can I extract this to multiple classification.
Simply add different labels?
e.g.:
new LabeledDocument(0L, a b c d e spark, 1.0),
new LabeledDocument(1L, b d, 0.0),
new LabeledDocument(2L, hadoop f g h, 2.0),
Can you try 1.5? This should work much, much better in 1.5 out of the box.
For 1.4, I think you'd want to turn on sort-merge-join, which is off by
default. However, the sort-merge join in 1.4 can still trigger a lot of
garbage, making it slower. SMJ performance is probably 5x - 1000x better in
I am running the Spark shell (1.2.1) in local mode and I have a simple
RDD[(String,String,Double)] with about 10,000 objects in it. I get a
StackOverFlowError each time I try to run the following code (the code
itself is just representative of other logic where I need to pass in a
variable). I
When you set .setMaster to local[4], it means that you are allocating 4
threads on your local machine. You can change it to local[1] to run it on a
single thread.
If you are submitting the job to a standalone spark cluster and you wanted
to limit the # cores for your job, then you can do it like
Hi!
I want to implement a multiclass classification for documents.
So I have different kinds of text files, and I want to classificate them
with spark mllib in java.
Do you have any code examples?
Thanks!
--
*Egyed Zsombor *
Junior Big Data Engineer
Mobile: +36 70 320 65 81 |
It's only available in Mesos?
I'm using spark standalone cluster, is there anything about it there?
On Fri, Aug 28, 2015 at 8:51 AM Akhil Das ak...@sigmoidanalytics.com
wrote:
You can create a custom mesos framework for your requirement, to get you
started you can check this out
Thanks for the recommendations. I had been focused on solving the problem
within Spark but a distributed database sounds like a better solution.
Jeff
On Sat, Aug 29, 2015 at 11:47 PM, Ted Yu yuzhih...@gmail.com wrote:
Not sure if the race condition you mentioned is related to Cassandra's
data
I am doing some memory tuning on my Spark job on YARN and I notice different
settings would give different results and affect the outcome of the Spark
job run. However, I am confused and do not understand completely why it
happens and would appreciate if someone can provide me with some guidance
Did you try putting a sc.stop at the end of your pipeline?
Thanks
Best Regards
On Thu, Aug 27, 2015 at 6:41 PM, pranay pranay.ton...@impetus.co.in wrote:
I have a java program that does this - (using Spark 1.3.1 ) Create a
command
string that uses spark-submit in it ( with my Class file etc
What problem are you having? you will have to trigger an action at the end
to execute this piece of code. Like:
rdd.mapPartitions(partitionOfRecords = {
DBConnectionInit()
val results = partitionOfRecords.map(..)
DBConnection.commit()
results
})*.count()*
Thanks
Best Regards
On Thu,
Have you tried `build/sbt assembly`?
On Sat, Aug 29, 2015 at 9:03 PM, Muler mulugeta.abe...@gmail.com wrote:
Hi guys,
I can successfully build Spark using Intellij, but I'm not able to
locate/generate spark assembly (jar file) in the assembly/target directly)
How do I generate one? I have
I would check out the Pipeline code example
https://spark.apache.org/docs/latest/ml-guide.html#example-pipeline
On Sat, Aug 29, 2015 at 9:23 PM, Zsombor Egyed egye...@starschema.net
wrote:
Hi!
I want to implement a multiclass classification for documents.
So I have different kinds of text
I think the spark.ml logistic regression currently only supports 0/1
labels. If you need multiclass, I would suggest to look at either the
spark.ml decision trees. If you don't care too much for pipelines, then you
could use the spark.mllib logistic regression after featurizing.
On Sat, Aug 29,
I think you have to use the keyword *set* to set an environment variable in
windows. Check the section Setting environment variables from
http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/ntcmds_shelloverview.mspx?mfr=true
Thanks
Best Regards
On Tue, Aug 25, 2015 at
This is more of a scala related question, have a look at the case classes
in scala http://www.scala-lang.org/old/node/107
Thanks
Best Regards
On Tue, Aug 25, 2015 at 6:55 PM, saif.a.ell...@wellsfargo.com wrote:
Hi all,
I have SomeClass[TYPE] { def some_method(args: fixed_type_args): TYPE }
29 matches
Mail list logo