Is there a way to tell if a receiver is a Reliable Receiver?

2017-04-17 Thread Justin Pihony
I can't seem to find anywhere that would let a user know if the receiver they are using is reliable or not. Even better would be a list of known reliable receivers. Are any of these things possible? Or do you just have to research your receiver beforehand? -- View this message in context:

Avro/Parquet GenericFixed decimal is not read into Spark correctly

2017-04-12 Thread Justin Pihony
All, Before creating a JIRA for this I wanted to get a sense as to whether it would be shot down or not: Take the following code: spark-shell --packages org.apache.avro:avro:1.8.1 import org.apache.avro.{Conversions, LogicalTypes, Schema} import java.math.BigDecimal val dc = new

SparkStreaming getActiveOrCreate

2017-03-18 Thread Justin Pihony
nything explicit -Justin Pihony -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkStreaming-getActiveOrCreate-tp28508.html Sent from the Apache Spark User List mailing list archi

Re: Jar not in shell classpath in Windows 10

2017-02-28 Thread Justin Pihony
I've verified this is that issue, so please disregard. On Wed, Mar 1, 2017 at 1:07 AM, Justin Pihony <justin.pih...@gmail.com> wrote: > As soon as I posted this I found https://issues.apache. > org/jira/browse/SPARK-18648 which seems to be the issue. I'm looking at > it deeper

Re: Jar not in shell classpath in Windows 10

2017-02-28 Thread Justin Pihony
As soon as I posted this I found https://issues.apache.org/jira/browse/SPARK-18648 which seems to be the issue. I'm looking at it deeper now. On Wed, Mar 1, 2017 at 1:05 AM, Justin Pihony <justin.pih...@gmail.com> wrote: > Run spark-shell --packages > datastax:spark-cassandra-connect

Jar not in shell classpath in Windows 10

2017-02-28 Thread Justin Pihony
Run spark-shell --packages datastax:spark-cassandra-connector:2.0.0-RC1-s_2.11 and then try to do an import of anything com.datastax. I have checked that the jar is listed among the classpaths and it is, albeit behind a spark URL. I'm wondering if added jars fail in windows due to this server

Is there a list of missing optimizations for typed functions?

2017-02-22 Thread Justin Pihony
't find anything in JIRA. Thanks, Justin Pihony -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-list-of-missing-optimizations-for-typed-functions-tp28418.html Sent from the Apache Spark User List mailing list archi

K-Means seems biased to one center

2015-10-05 Thread Justin Pihony
(Cross post with http://stackoverflow.com/questions/32936380/k-means-clustering-is-biased-to-one-center) I have a corpus of wiki pages (baseball, hockey, music, football) which I'm running through tfidf and then through kmeans. After a couple issues to start (you can see my previous questions),

Re: Is MLBase dead?

2015-09-28 Thread Justin Pihony
To take a stab at my own answer: MLBase is now fully integrated into MLLib. MLI/MLLib are the mllib algorithms and MLO is the ml pipelines? On Mon, Sep 28, 2015 at 10:19 PM, Justin Pihony <justin.pih...@gmail.com> wrote: > As in, is MLBase (MLO/MLI/MLlib) now simply org.apache.sp

Is MLBase dead?

2015-09-28 Thread Justin Pihony
As in, is MLBase (MLO/MLI/MLlib) now simply org.apache.spark.mllib and org.apache.spark.ml? I cannot find anything official, and the last updates seem to be a year or two old. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-MLBase-dead-tp24854.html Sent

Re: How to access Spark UI through AWS

2015-08-25 Thread Justin Pihony
to: http://ec2_publicdns:20888/proxy/applicationid/jobs (9046 is the older emr port) or, as Jonathan said, the spark history server works once a job is completed. On Tue, Aug 25, 2015 at 5:26 PM, Justin Pihony justin.pih...@gmail.com wrote: OK, I figured the horrid look alsothe href

Re: How to access Spark UI through AWS

2015-08-25 Thread Justin Pihony
18080. Hope that helps! ~ Jonathan On 8/24/15, 10:51 PM, Justin Pihony justin.pih...@gmail.com wrote: I am using the steps from this article https://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923 to get spark up and running on EMR through yarn. Once up and running I ssh

Re: How to access Spark UI through AWS

2015-08-25 Thread Justin Pihony
OK, I figured the horrid look alsothe href of all of the styles is prefixed with the proxy dataso, ultimately if I can fix the proxy issues with the links, then I can fix the look also On Tue, Aug 25, 2015 at 5:17 PM, Justin Pihony justin.pih...@gmail.com wrote: SUCCESS! I set

Re: How to access Spark UI through AWS

2015-08-25 Thread Justin Pihony
as the UI looks horridbut I'll tackle that next :) On Tue, Aug 25, 2015 at 4:31 PM, Justin Pihony justin.pih...@gmail.com wrote: Thanks. I just tried and still am having trouble. It seems to still be using the private address even if I try going through the resource manager. On Tue, Aug 25, 2015

Re: Got wrong md5sum for boto

2015-08-24 Thread Justin Pihony
Additional info...If I use an online md5sum check then it matches...So, it's either windows or python (using 2.7.10) On Mon, Aug 24, 2015 at 11:54 AM, Justin Pihony justin.pih...@gmail.com wrote: When running the spark_ec2.py script, I'm getting a wrong md5sum. I've now seen this on two

Re: Got wrong md5sum for boto

2015-08-24 Thread Justin Pihony
at 11:58 AM, Justin Pihony justin.pih...@gmail.com wrote: Additional info...If I use an online md5sum check then it matches...So, it's either windows or python (using 2.7.10) On Mon, Aug 24, 2015 at 11:54 AM, Justin Pihony justin.pih...@gmail.com wrote: When running the spark_ec2.py script, I'm

Got wrong md5sum for boto

2015-08-24 Thread Justin Pihony
When running the spark_ec2.py script, I'm getting a wrong md5sum. I've now seen this on two different machines. I am running on windows, but I would imagine that shouldn't affect the md5. Is this a boto problem, python problem, spark problem? -- View this message in context:

How to access Spark UI through AWS

2015-08-24 Thread Justin Pihony
I am using the steps from this article https://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923 to get spark up and running on EMR through yarn. Once up and running I ssh in and cd to the spark bin and run spark-shell --master yarn. Once this spins up I can see that the UI is started

Re: Accumulators in Spark Streaming on UI

2015-05-26 Thread Justin Pihony
You need to make sure to name the accumulator. On Tue, May 26, 2015 at 2:23 PM, Snehal Nagmote nagmote.sne...@gmail.com wrote: Hello all, I have accumulator in spark streaming application which counts number of events received from Kafka. From the documentation , It seems Spark UI has

Re: Why is RDD to PairRDDFunctions only via implicits?

2015-05-22 Thread Justin Pihony
for K and V separately. On Fri, May 22, 2015 at 10:26 AM, Justin Pihony justin.pih...@gmail.com wrote: This ticket https://issues.apache.org/jira/browse/SPARK-4397 improved the RDD API, but it could be even more discoverable if made available via the API directly. I assume

Why is RDD to PairRDDFunctions only via implicits?

2015-05-22 Thread Justin Pihony
as the implicits remain, then compatibility remains, but now it is explicit in the docs on how to get a PairRDD and in tab completion. Thoughts? Justin Pihony

Re: Spark logo license

2015-05-19 Thread Justin Pihony
Thanks! On Wed, May 20, 2015 at 12:41 AM, Matei Zaharia matei.zaha...@gmail.com wrote: Check out Apache's trademark guidelines here: http://www.apache.org/foundation/marks/ Matei On May 20, 2015, at 12:02 AM, Justin Pihony justin.pih...@gmail.com wrote: What is the license on using

Spark logo license

2015-05-19 Thread Justin Pihony
What is the license on using the spark logo. Is it free to be used for displaying commercially? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-logo-license-tp22952.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Windows DOS bug in windows-utils.cmd

2015-05-19 Thread Justin Pihony
When running something like this: spark-shell --jars foo.jar,bar.jar This keeps failing to include the tail of the jars list. Digging into the launch scripts I found that the comma makes it so that the list was sent as separate parameters. So, to keep things together, I tried

TwitterUtils on Windows

2015-05-18 Thread Justin Pihony
I am trying to print a basic twitter stream and receiving the following error: 15/05/18 22:03:14 INFO Executor: Fetching http://192.168.56.1:49752/jars/twitter4j-media-support-3.0.3.jar with timestamp 1432000973058 15/05/18 22:03:14 INFO Utils: Fetching

Re: TwitterUtils on Windows

2015-05-18 Thread Justin Pihony
I think I found the answer - http://apache-spark-user-list.1001560.n3.nabble.com/Error-while-running-example-scala-application-using-spark-submit-td10056.html Do I have no way of running this in Windows locally? On Mon, May 18, 2015 at 10:44 PM, Justin Pihony justin.pih...@gmail.com wrote

Re: TwitterUtils on Windows

2015-05-18 Thread Justin Pihony
I'm not 100% sure that is causing a problem, though. The stream still starts, but is giving blank output. I checked the environment variables in the ui and it is running local[*], so there should be no bottleneck there. On Mon, May 18, 2015 at 10:08 PM, Justin Pihony justin.pih...@gmail.com wrote

Trying to understand sc.textFile better

2015-05-17 Thread Justin Pihony
All, I am trying to understand the textFile method deeply, but I think my lack of deep Hadoop knowledge is holding me back here. Let me lay out my understanding and maybe you can correct anything that is incorrect When sc.textFile(path) is called, then defaultMinPartitions is used, which

Did DataFrames break basic SQLContext?

2015-03-18 Thread Justin Pihony
I started to play with 1.3.0 and found that there are a lot of breaking changes. Previously, I could do the following: case class Foo(x: Int) val rdd = sc.parallelize(List(Foo(1))) import sqlContext._ rdd.registerTempTable(foo) Now, I am not able to directly use my RDD object and

Re: Did DataFrames break basic SQLContext?

2015-03-18 Thread Justin Pihony
results in a frozen shell after this line: INFO MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: @ (64), after : . which, locks the internally created metastore_db On Wed, Mar 18, 2015 at 11:20 AM, Justin Pihony justin.pih

Bug in Streaming files?

2015-03-14 Thread Justin Pihony
All, Looking into this StackOverflow question https://stackoverflow.com/questions/29022379/spark-streaming-hdfs/29036469 it appears that there is a bug when utilizing the newFilesOnly parameter in FileInputDStream. Before creating a ticket, I wanted to verify it here. The gist is that this

SparkSQL JSON array support

2015-03-05 Thread Justin Pihony
Is there any plans of supporting JSON arrays more fully? Take for example: val myJson = sqlContext.jsonRDD(List({foo:[{bar:1},{baz:2}]})) myJson.registerTempTable(JsonTest) I would like a way to pull out parts of the array data based on a key sql(SELECT foo[bar] FROM JsonTest)

Re: Spark SQL Static Analysis

2015-03-04 Thread Justin Pihony
Thanks! On Wed, Mar 4, 2015 at 3:58 PM, Michael Armbrust mich...@databricks.com wrote: It is somewhat out of data, but here is what we have so far: https://github.com/marmbrus/sql-typed On Wed, Mar 4, 2015 at 12:53 PM, Justin Pihony justin.pih...@gmail.com wrote: I am pretty sure that I

Spark SQL Static Analysis

2015-03-04 Thread Justin Pihony
I am pretty sure that I saw a presentation where SparkSQL could be executed with static analysis, however I cannot find the presentation now, nor can I find any documentation or research papers on the topic. So, I am curious if there is indeed any work going on for this topic. The two things I

SQLContext.applySchema strictness

2015-02-13 Thread Justin Pihony
Per the documentation: It is important to make sure that the structure of every Row of the provided RDD matches the provided schema. Otherwise, there will be runtime exception. However, it appears that this is not being enforced. import org.apache.spark.sql._ val sqlContext = new

Re: SQLContext.applySchema strictness

2015-02-13 Thread Justin Pihony
to be scanned to give a correct answer. Thanks, Yin On Fri, Feb 13, 2015 at 1:33 PM, Justin Pihony justin.pih...@gmail.com wrote: Per the documentation: It is important to make sure that the structure of every Row of the provided RDD matches the provided schema. Otherwise, there will be runtime