Re: Where is Redgate's HDFS explorer?

2015-08-29 Thread Akhil Das
You can also mount HDFS through the NFS gateway and access i think. Thanks Best Regards On Tue, Aug 25, 2015 at 3:43 AM, Dino Fancellu d...@felstar.com wrote: http://hortonworks.com/blog/windows-explorer-experience-hdfs/ Seemed to exist, now now sign. Anything similar to tie HDFS into

Re: History server is not receiving any event

2015-08-29 Thread Akhil Das
Are you starting your history server? ./sbin/start-history-server.sh You can read more here http://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact Thanks Best Regards On Tue, Aug 25, 2015 at 1:07 AM, b.bhavesh b.borisan...@gmail.com wrote: Hi, I am working on

Re: Where is Redgate's HDFS explorer?

2015-08-29 Thread Ted Yu
See https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html FYI On Sat, Aug 29, 2015 at 1:04 AM, Akhil Das ak...@sigmoidanalytics.com wrote: You can also mount HDFS through the NFS gateway and access i think. Thanks Best Regards On Tue, Aug 25, 2015 at

Re: Array Out OF Bound Exception

2015-08-29 Thread Akhil Das
I suspect in the last scenario you are having an empty new line at the last line. If you put a try..catch you'd definitely know. Thanks Best Regards On Tue, Aug 25, 2015 at 2:53 AM, Michael Armbrust mich...@databricks.com wrote: This top line here is indicating that the exception is being

Re: Where is Redgate's HDFS explorer?

2015-08-29 Thread Roberto Congiu
If HDFS is on a linux VM, you could also mount it with FUSE and export it with samba 2015-08-29 2:26 GMT-07:00 Ted Yu yuzhih...@gmail.com: See https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html FYI On Sat, Aug 29, 2015 at 1:04 AM, Akhil Das

Re: Where is Redgate's HDFS explorer?

2015-08-29 Thread Roberto Congiu
It depends, if HDFS is running under windows, FUSE won't work, but if HDFS is on a linux VM, Box, or cluster, then you can have the linux box/vm mount HDFS through FUSE and at the same time export its mount point on samba. At that point, your windows machine can just connect to the samba share. R.

Re: Where is Redgate's HDFS explorer?

2015-08-29 Thread Dino Fancellu
I'm using Windows. Are you saying it works with Windows? Dino. On 29 August 2015 at 09:04, Akhil Das ak...@sigmoidanalytics.com wrote: You can also mount HDFS through the NFS gateway and access i think. Thanks Best Regards On Tue, Aug 25, 2015 at 3:43 AM, Dino Fancellu d...@felstar.com

Apache Spark Suitable JDBC Driver not found

2015-08-29 Thread shawon
0 down vote favorite I am using Apache Spark for analyzing query log. I already faced some difficulties to setup spark. Now I am using a standalone cluster to process queries. First I used example code in java to count words that worked fine. But when I try to connect it to a MySQL

Re: Array Out OF Bound Exception

2015-08-29 Thread Raghavendra Pandey
So either you empty line at the end or when you use string.split you dont specify -1 as second parameter... On Aug 29, 2015 1:18 PM, Akhil Das ak...@sigmoidanalytics.com wrote: I suspect in the last scenario you are having an empty new line at the last line. If you put a try..catch you'd

Re: How to set environment of worker applications

2015-08-29 Thread Jan Algermissen
Finally, I found the solution: on the spark context you can set spark.executorEnv.[EnvironmentVariableName] and these will be available in the environment of the executors This is in fact documented, but somehow I missed it.

Re: Alternative to Large Broadcast Variables

2015-08-29 Thread Raghavendra Pandey
We are using Cassandra for similar kind of problem and it works well... You need to take care of race condition between updating the store and looking up the store... On Aug 29, 2015 1:31 AM, Ted Yu yuzhih...@gmail.com wrote: +1 on Jason's suggestion. bq. this large variable is broadcast many

Re: Build k-NN graph for large dataset

2015-08-29 Thread Maruf Aytekin
Yes you need to use dimensionality reduction and/or locality sensitive hashing to reduce number of pairs to compare. There is also LSH implementation for collection of vectors I have just published here: https://github.com/marufaytekin/lsh-spark. Implementation i based on this paper:

Re: Spark-on-YARN LOCAL_DIRS location

2015-08-29 Thread Akhil Das
Yes, you can set the SPARK_LOCAL_DIR in the spark-env.sh or spark.local.dir in the spark-defaults.conf file, then it would use this location for the shuffle writes etc. Thanks Best Regards On Wed, Aug 26, 2015 at 6:56 PM, michael.engl...@nomura.com wrote: Hi, I am having issues with /tmp

Re: Is there a way to store RDD and load it with its original format?

2015-08-29 Thread Akhil Das
You can do a rdd.saveAsObjectFile for storing and for reading you can do a sc.objectFile Thanks Best Regards On Thu, Aug 27, 2015 at 9:29 PM, saif.a.ell...@wellsfargo.com wrote: Hi, Any way to store/load RDDs keeping their original object instead of string? I am having trouble with parquet

Re: Spark MLLIB multiclass calssification

2015-08-29 Thread Zsombor Egyed
Thank you, I saw this before, but it is just a binary classification, so how can I extract this to multiple classification. Simply add different labels? e.g.: new LabeledDocument(0L, a b c d e spark, 1.0), new LabeledDocument(1L, b d, 0.0), new LabeledDocument(2L, hadoop f g h, 2.0),

Re: How to avoid shuffle errors for a large join ?

2015-08-29 Thread Reynold Xin
Can you try 1.5? This should work much, much better in 1.5 out of the box. For 1.4, I think you'd want to turn on sort-merge-join, which is off by default. However, the sort-merge join in 1.4 can still trigger a lot of garbage, making it slower. SMJ performance is probably 5x - 1000x better in

Spark shell and StackOverFlowError

2015-08-29 Thread ashrowty
I am running the Spark shell (1.2.1) in local mode and I have a simple RDD[(String,String,Double)] with about 10,000 objects in it. I get a StackOverFlowError each time I try to run the following code (the code itself is just representative of other logic where I need to pass in a variable). I

Re: Setting number of CORES from inside the Topology (JAVA code )

2015-08-29 Thread Akhil Das
When you set .setMaster to local[4], it means that you are allocating 4 threads on your local machine. You can change it to local[1] to run it on a single thread. If you are submitting the job to a standalone spark cluster and you wanted to limit the # cores for your job, then you can do it like

Spark MLLIB multiclass calssification

2015-08-29 Thread Zsombor Egyed
Hi! I want to implement a multiclass classification for documents. So I have different kinds of text files, and I want to classificate them with spark mllib in java. Do you have any code examples? Thanks! -- *Egyed Zsombor * Junior Big Data Engineer Mobile: +36 70 320 65 81 |

Re: How to remove worker node but let it finish first?

2015-08-29 Thread Romi Kuntsman
It's only available in Mesos? I'm using spark standalone cluster, is there anything about it there? On Fri, Aug 28, 2015 at 8:51 AM Akhil Das ak...@sigmoidanalytics.com wrote: You can create a custom mesos framework for your requirement, to get you started you can check this out

Re: Alternative to Large Broadcast Variables

2015-08-29 Thread Hemminger Jeff
Thanks for the recommendations. I had been focused on solving the problem within Spark but a distributed database sounds like a better solution. Jeff On Sat, Aug 29, 2015 at 11:47 PM, Ted Yu yuzhih...@gmail.com wrote: Not sure if the race condition you mentioned is related to Cassandra's data

Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs

2015-08-29 Thread timothy22000
I am doing some memory tuning on my Spark job on YARN and I notice different settings would give different results and affect the outcome of the Spark job run. However, I am confused and do not understand completely why it happens and would appreciate if someone can provide me with some guidance

Re: spark-submit issue

2015-08-29 Thread Akhil Das
Did you try putting a sc.stop at the end of your pipeline? Thanks Best Regards On Thu, Aug 27, 2015 at 6:41 PM, pranay pranay.ton...@impetus.co.in wrote: I have a java program that does this - (using Spark 1.3.1 ) Create a command string that uses spark-submit in it ( with my Class file etc

Re: commit DB Transaction for each partition

2015-08-29 Thread Akhil Das
What problem are you having? you will have to trigger an action at the end to execute this piece of code. Like: rdd.mapPartitions(partitionOfRecords = { DBConnectionInit() val results = partitionOfRecords.map(..) DBConnection.commit() results })*.count()* Thanks Best Regards On Thu,

Re: How to generate spark assembly (jar file) using Intellij

2015-08-29 Thread Feynman Liang
Have you tried `build/sbt assembly`? On Sat, Aug 29, 2015 at 9:03 PM, Muler mulugeta.abe...@gmail.com wrote: Hi guys, I can successfully build Spark using Intellij, but I'm not able to locate/generate spark assembly (jar file) in the assembly/target directly) How do I generate one? I have

Re: Spark MLLIB multiclass calssification

2015-08-29 Thread Feynman Liang
I would check out the Pipeline code example https://spark.apache.org/docs/latest/ml-guide.html#example-pipeline On Sat, Aug 29, 2015 at 9:23 PM, Zsombor Egyed egye...@starschema.net wrote: Hi! I want to implement a multiclass classification for documents. So I have different kinds of text

Re: Spark MLLIB multiclass calssification

2015-08-29 Thread Feynman Liang
I think the spark.ml logistic regression currently only supports 0/1 labels. If you need multiclass, I would suggest to look at either the spark.ml decision trees. If you don't care too much for pipelines, then you could use the spark.mllib logistic regression after featurizing. On Sat, Aug 29,

Re: Invalid environment variable name when submitting job from windows

2015-08-29 Thread Akhil Das
I think you have to use the keyword *set* to set an environment variable in windows. Check the section Setting environment variables from http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/ntcmds_shelloverview.mspx?mfr=true Thanks Best Regards On Tue, Aug 25, 2015 at

Re: Scala: Overload method by its class type

2015-08-29 Thread Akhil Das
This is more of a scala related question, have a look at the case classes in scala http://www.scala-lang.org/old/node/107 Thanks Best Regards On Tue, Aug 25, 2015 at 6:55 PM, saif.a.ell...@wellsfargo.com wrote: Hi all, I have SomeClass[TYPE] { def some_method(args: fixed_type_args): TYPE }