Re: Cannot read from s3 using sc.textFile

2014-10-07 Thread Sunny Khatri
Not sure if it's supposed to work. Can you try newAPIHadoopFile() passing in the required configuration object. On Tue, Oct 7, 2014 at 4:20 AM, Tomer Benyamini tomer@gmail.com wrote: Hello, I'm trying to read from s3 using a simple spark java app: - SparkConf

Re: return probability \ confidence instead of actual class

2014-10-07 Thread Sunny Khatri
: Well, apparently, the above Python set-up is wrong. Please consider the following set-up which DOES use 'linear' kernel... And the question remains the same: how to interpret Spark results (or why Spark results are NOT bounded between -1 and 1)? On Mon, Oct 6, 2014 at 8:35 PM, Sunny Khatri

Re: Shuffle files

2014-10-07 Thread Sunny Khatri
@SK: Make sure ulimit has taken effect as Todd mentioned. You can verify via ulimit -a. Also make sure you have proper kernel parameters set in /etc/sysctl.conf (MacOSX) On Tue, Oct 7, 2014 at 3:57 PM, Lisonbee, Todd todd.lison...@intel.com wrote: Are you sure the new ulimit has taken effect?

Re: return probability \ confidence instead of actual class

2014-10-06 Thread Sunny Khatri
) /// On Thu, Sep 25, 2014 at 2:25 AM, Sunny Khatri sunny.k...@gmail.com wrote: For multi-class you can use the same SVMWithSGD (for binary classification) with One-vs-All approach constructing respective training corpuses consisting one

Re: Fwd: Spark SQL: ArrayIndexOutofBoundsException

2014-10-02 Thread Sunny Khatri
You can do filter with startswith ? On Thu, Oct 2, 2014 at 4:04 PM, SK skrishna...@gmail.com wrote: Thanks for the help. Yes, I did not realize that the first header line has a different separator. By the way, is there a way to drop the first line that contains the header? Something along

Re: return probability \ confidence instead of actual class

2014-09-24 Thread Sunny Khatri
For multi-class you can use the same SVMWithSGD (for binary classification) with One-vs-All approach constructing respective training corpuses consisting one Class i as positive samples and Rest of the classes as negative one, and then use the same method provided by Aris as a measure of how far

Re: Using Hadoop InputFormat in Python

2014-08-13 Thread Sunny Khatri
Not that much familiar with Python APIs, but You should be able to configure a job object with your custom InputFormat and pass in the required configuration (:- job.getConfiguration()) to newAPIHadoopRDD to get the required RDD On Wed, Aug 13, 2014 at 2:59 PM, Tassilo Klein tjkl...@gmail.com

Reference External Variables in Map Function (Inner class)

2014-08-12 Thread Sunny Khatri
I have a class defining an inner static class (map function). The inner class tries to refer the variable instantiated in the outside class, which results in a NullPointerException. Sample Code as follows: class SampleOuterClass { private static ArrayListString someVariable;

Re: Reference External Variables in Map Function (Inner class)

2014-08-12 Thread Sunny Khatri
will be looking at its local SampleOuterClass, which is maybe not initialized on the remote JVM. On Tue, Aug 12, 2014 at 6:02 PM, Sunny Khatri sunny.k...@gmail.com wrote: I have a class defining an inner static class (map function). The inner class tries to refer the variable instantiated

Re: Reference External Variables in Map Function (Inner class)

2014-08-12 Thread Sunny Khatri
, 2014 at 10:56 AM, Sunny Khatri sunny.k...@gmail.com wrote: Are there any other workarounds that could be used to pass in the values from someVariable to the transformation function ? On Tue, Aug 12, 2014 at 10:48 AM, Sean Owen so...@cloudera.com wrote: I don't think static members

Spark Memory Issues

2014-08-05 Thread Sunny Khatri
Hi, I'm trying to run a spark application with the executor-memory 3G. but I'm running into the following error: 14/08/05 18:02:58 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[5] at map at KMeans.scala:123), which has no missing parents 14/08/05 18:02:58 INFO DAGScheduler: Submitting 1

Re: Spark Memory Issues

2014-08-05 Thread Sunny Khatri
URI (as seen in the top left of the webUI) while creating the SparkContext. Thanks Best Regards On Tue, Aug 5, 2014 at 11:38 PM, Sunny Khatri sunny.k...@gmail.com wrote: Hi, I'm trying to run a spark application with the executor-memory 3G. but I'm running into the following error: 14

Re: Spark Memory Issues

2014-08-05 Thread Sunny Khatri
Yeah, ran it on yarn-cluster mode. On Tue, Aug 5, 2014 at 12:17 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Are you sure that you were not running SparkPi in local mode? Thanks Best Regards On Wed, Aug 6, 2014 at 12:43 AM, Sunny Khatri sunny.k...@gmail.com wrote: Well I was able