I can't seem to find anywhere that would let a user know if the receiver they
are using is reliable or not. Even better would be a list of known reliable
receivers. Are any of these things possible? Or do you just have to research
your receiver beforehand?
--
View this message in context:
All,
Before creating a JIRA for this I wanted to get a sense as to whether it
would be shot down or not:
Take the following code:
spark-shell --packages org.apache.avro:avro:1.8.1
import org.apache.avro.{Conversions, LogicalTypes, Schema}
import java.math.BigDecimal
val dc = new
nything
explicit
-Justin Pihony
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkStreaming-getActiveOrCreate-tp28508.html
Sent from the Apache Spark User List mailing list archi
I've verified this is that issue, so please disregard.
On Wed, Mar 1, 2017 at 1:07 AM, Justin Pihony <justin.pih...@gmail.com>
wrote:
> As soon as I posted this I found https://issues.apache.
> org/jira/browse/SPARK-18648 which seems to be the issue. I'm looking at
> it deeper
As soon as I posted this I found
https://issues.apache.org/jira/browse/SPARK-18648 which seems to be the
issue. I'm looking at it deeper now.
On Wed, Mar 1, 2017 at 1:05 AM, Justin Pihony <justin.pih...@gmail.com>
wrote:
> Run spark-shell --packages
> datastax:spark-cassandra-connect
Run spark-shell --packages
datastax:spark-cassandra-connector:2.0.0-RC1-s_2.11 and then try to do an
import of anything com.datastax. I have checked that the jar is listed among
the classpaths and it is, albeit behind a spark URL. I'm wondering if added
jars fail in windows due to this server
't find anything in JIRA.
Thanks,
Justin Pihony
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-list-of-missing-optimizations-for-typed-functions-tp28418.html
Sent from the Apache Spark User List mailing list archi
(Cross post with
http://stackoverflow.com/questions/32936380/k-means-clustering-is-biased-to-one-center)
I have a corpus of wiki pages (baseball, hockey, music, football) which I'm
running through tfidf and then through kmeans. After a couple issues to
start (you can see my previous questions),
To take a stab at my own answer: MLBase is now fully integrated into MLLib.
MLI/MLLib are the mllib algorithms and MLO is the ml pipelines?
On Mon, Sep 28, 2015 at 10:19 PM, Justin Pihony <justin.pih...@gmail.com>
wrote:
> As in, is MLBase (MLO/MLI/MLlib) now simply org.apache.sp
As in, is MLBase (MLO/MLI/MLlib) now simply org.apache.spark.mllib and
org.apache.spark.ml? I cannot find anything official, and the last updates
seem to be a year or two old.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Is-MLBase-dead-tp24854.html
Sent
to:
http://ec2_publicdns:20888/proxy/applicationid/jobs (9046 is the older
emr port)
or, as Jonathan said, the spark history server works once a job is
completed.
On Tue, Aug 25, 2015 at 5:26 PM, Justin Pihony justin.pih...@gmail.com
wrote:
OK, I figured the horrid look alsothe href
18080. Hope that helps!
~ Jonathan
On 8/24/15, 10:51 PM, Justin Pihony justin.pih...@gmail.com wrote:
I am using the steps from this article
https://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923 to
get spark up and running on EMR through yarn. Once up and running I ssh
OK, I figured the horrid look alsothe href of all of the styles is
prefixed with the proxy dataso, ultimately if I can fix the proxy
issues with the links, then I can fix the look also
On Tue, Aug 25, 2015 at 5:17 PM, Justin Pihony justin.pih...@gmail.com
wrote:
SUCCESS! I set
as the UI looks horridbut I'll tackle that next :)
On Tue, Aug 25, 2015 at 4:31 PM, Justin Pihony justin.pih...@gmail.com
wrote:
Thanks. I just tried and still am having trouble. It seems to still be
using the private address even if I try going through the resource manager.
On Tue, Aug 25, 2015
Additional info...If I use an online md5sum check then it matches...So,
it's either windows or python (using 2.7.10)
On Mon, Aug 24, 2015 at 11:54 AM, Justin Pihony justin.pih...@gmail.com
wrote:
When running the spark_ec2.py script, I'm getting a wrong md5sum. I've now
seen this on two
at 11:58 AM, Justin Pihony justin.pih...@gmail.com
wrote:
Additional info...If I use an online md5sum check then it matches...So,
it's either windows or python (using 2.7.10)
On Mon, Aug 24, 2015 at 11:54 AM, Justin Pihony justin.pih...@gmail.com
wrote:
When running the spark_ec2.py script, I'm
When running the spark_ec2.py script, I'm getting a wrong md5sum. I've now
seen this on two different machines. I am running on windows, but I would
imagine that shouldn't affect the md5. Is this a boto problem, python
problem, spark problem?
--
View this message in context:
I am using the steps from this article
https://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923 to
get spark up and running on EMR through yarn. Once up and running I ssh in
and cd to the spark bin and run spark-shell --master yarn. Once this spins
up I can see that the UI is started
You need to make sure to name the accumulator.
On Tue, May 26, 2015 at 2:23 PM, Snehal Nagmote nagmote.sne...@gmail.com
wrote:
Hello all,
I have accumulator in spark streaming application which counts number of
events received from Kafka.
From the documentation , It seems Spark UI has
for K and V separately.
On Fri, May 22, 2015 at 10:26 AM, Justin Pihony justin.pih...@gmail.com
wrote:
This ticket https://issues.apache.org/jira/browse/SPARK-4397 improved
the RDD API, but it could be even more discoverable if made available via
the API directly. I assume
as the implicits remain, then compatibility remains, but now it is
explicit in the docs on how to get a PairRDD and in tab completion.
Thoughts?
Justin Pihony
Thanks!
On Wed, May 20, 2015 at 12:41 AM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Check out Apache's trademark guidelines here:
http://www.apache.org/foundation/marks/
Matei
On May 20, 2015, at 12:02 AM, Justin Pihony justin.pih...@gmail.com
wrote:
What is the license on using
What is the license on using the spark logo. Is it free to be used for
displaying commercially?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-logo-license-tp22952.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
When running something like this:
spark-shell --jars foo.jar,bar.jar
This keeps failing to include the tail of the jars list. Digging into the
launch scripts I found that the comma makes it so that the list was sent as
separate parameters. So, to keep things together, I tried
I am trying to print a basic twitter stream and receiving the following
error:
15/05/18 22:03:14 INFO Executor: Fetching
http://192.168.56.1:49752/jars/twitter4j-media-support-3.0.3.jar with
timestamp 1432000973058
15/05/18 22:03:14 INFO Utils: Fetching
I think I found the answer -
http://apache-spark-user-list.1001560.n3.nabble.com/Error-while-running-example-scala-application-using-spark-submit-td10056.html
Do I have no way of running this in Windows locally?
On Mon, May 18, 2015 at 10:44 PM, Justin Pihony justin.pih...@gmail.com
wrote
I'm not 100% sure that is causing a problem, though. The stream still
starts, but is giving blank output. I checked the environment variables in
the ui and it is running local[*], so there should be no bottleneck there.
On Mon, May 18, 2015 at 10:08 PM, Justin Pihony justin.pih...@gmail.com
wrote
All,
I am trying to understand the textFile method deeply, but I think my
lack of deep Hadoop knowledge is holding me back here. Let me lay out my
understanding and maybe you can correct anything that is incorrect
When sc.textFile(path) is called, then defaultMinPartitions is used,
which
I started to play with 1.3.0 and found that there are a lot of breaking
changes. Previously, I could do the following:
case class Foo(x: Int)
val rdd = sc.parallelize(List(Foo(1)))
import sqlContext._
rdd.registerTempTable(foo)
Now, I am not able to directly use my RDD object and
results in a frozen shell after this line:
INFO MetaStoreDirectSql: MySQL check failed, assuming we are not on
mysql: Lexical error at line 1, column 5. Encountered: @ (64), after :
.
which, locks the internally created metastore_db
On Wed, Mar 18, 2015 at 11:20 AM, Justin Pihony justin.pih
All,
Looking into this StackOverflow question
https://stackoverflow.com/questions/29022379/spark-streaming-hdfs/29036469
it appears that there is a bug when utilizing the newFilesOnly parameter in
FileInputDStream. Before creating a ticket, I wanted to verify it here. The
gist is that this
Is there any plans of supporting JSON arrays more fully? Take for example:
val myJson =
sqlContext.jsonRDD(List({foo:[{bar:1},{baz:2}]}))
myJson.registerTempTable(JsonTest)
I would like a way to pull out parts of the array data based on a key
sql(SELECT foo[bar] FROM JsonTest)
Thanks!
On Wed, Mar 4, 2015 at 3:58 PM, Michael Armbrust mich...@databricks.com
wrote:
It is somewhat out of data, but here is what we have so far:
https://github.com/marmbrus/sql-typed
On Wed, Mar 4, 2015 at 12:53 PM, Justin Pihony justin.pih...@gmail.com
wrote:
I am pretty sure that I
I am pretty sure that I saw a presentation where SparkSQL could be executed
with static analysis, however I cannot find the presentation now, nor can I
find any documentation or research papers on the topic. So, I am curious if
there is indeed any work going on for this topic. The two things I
Per the documentation:
It is important to make sure that the structure of every Row of the
provided RDD matches the provided schema. Otherwise, there will be runtime
exception.
However, it appears that this is not being enforced.
import org.apache.spark.sql._
val sqlContext = new
to be scanned to give a correct answer.
Thanks,
Yin
On Fri, Feb 13, 2015 at 1:33 PM, Justin Pihony justin.pih...@gmail.com
wrote:
Per the documentation:
It is important to make sure that the structure of every Row of the
provided RDD matches the provided schema. Otherwise, there will be runtime
36 matches
Mail list logo