Dear All,
My code works fine with JSON input data. When I tried the Parquet data
format, it worked for English data. For Japanese text, I am getting the below
stack-trace. Pls help!
Caused by: java.lang.ClassCastException: optional binary element (UTF8) is not
a group
at org.apache.parque
(cc'ing dev list also)
I think a more general version of ranking metrics that allows arbitrary
relevance scores could be useful. Ranking metrics are applicable to other
settings like search or other learning-to-rank use cases, so it should be a
little more generic than pure recommender settings.
Hi, all:
I'm using spark sql thrift server under Spark1.3.1 to do hive sql query. I
started spark sql thrift server like ./sbin/start-thriftserver.sh --master
yarn-client --num-executors 12 --executor-memory 5g --driver-memory 5g, then
sent continuos hive sql to the thrift server.
However
These types of questions would be better asked on the user mailing list for
the Spark Cassandra connector:
http://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user
Version compatibility can be found here:
https://github.com/datastax/spark-cassandra-connector#version-compa
Hi,
I have a Rdd of n rows, i want to transform this to a Json RDD, and also
add some more information , any idea how to accomplish this .
ex : -
i have rdd with n rows with data like below , ,
16.9527493170273,20.1989561393151,15.7065424947394
17.9527493170273,21.1989561393151,15.70654249
Enrich the RDDs first with more information and then map it to some case
class , if you are using scala.
You can then use play api's
(play.api.libs.json.Writes/play.api.libs.json.Json) classes to convert the
mapped case class to json.
Thanks
Deepak
On Tue, Sep 20, 2016 at 6:42 PM, sujeet jog wro
Have you checked to see if any files already exist at /nfspartition/sankar/
banking_l1_v2.csv? If so, you will need to delete them before attempting to
save your DataFrame to that location. Alternatively, you may be able to
specify the "mode" setting of the df.write operation to "overwrite",
depend
Hey Kevin,
It is a empty directory, It is able to write part files to the directory
but while merging those part files we are getting above error.
Regards
On Tue, Sep 20, 2016 at 7:46 PM, Kevin Mellott
wrote:
> Have you checked to see if any files already exist at
> /nfspartition/sankar/banki
Is this a bug?
On Sep 19, 2016 10:10 PM, "janardhan shetty" wrote:
> Hi,
>
> I am hitting this issue. https://issues.apache.org/jira/browse/SPARK-10835
> .
>
> Issue seems to be resolved but resurfacing in 2.0 ML. Any workaround is
> appreciated ?
>
> Note:
> Pipeline has Ngram before word2Vec.
>
Ah, I think that this was supposed to be changed with SPARK-9062. Let
me see about reopening 10835 and addressing it.
On Tue, Sep 20, 2016 at 3:24 PM, janardhan shetty
wrote:
> Is this a bug?
>
> On Sep 19, 2016 10:10 PM, "janardhan shetty" wrote:
>>
>> Hi,
>>
>> I am hitting this issue.
>> http
Thanks Sean.
On Sep 20, 2016 7:45 AM, "Sean Owen" wrote:
> Ah, I think that this was supposed to be changed with SPARK-9062. Let
> me see about reopening 10835 and addressing it.
>
> On Tue, Sep 20, 2016 at 3:24 PM, janardhan shetty
> wrote:
> > Is this a bug?
> >
> > On Sep 19, 2016 10:10 PM, "
-dev +user
Than warning pretty much means what it says - the consumer tried to
get records for the given partition / offset, and couldn't do so after
polling the kafka broker for X amount of time.
If that only happens when you put additional load on Kafka via
producing, the first thing I'd do is
You can probably just do an identity transformation on the column to
make its type a nullable String array -- ArrayType(StringType, true).
Of course, I'm not sure why Word2Vec must reject a non-null array type
when it can of course handle nullable, but the previous discussion
indicated that this ha
Hi Sean,
Any suggestions for workaround as of now?
On Sep 20, 2016 7:46 AM, "janardhan shetty" wrote:
> Thanks Sean.
> On Sep 20, 2016 7:45 AM, "Sean Owen" wrote:
>
>> Ah, I think that this was supposed to be changed with SPARK-9062. Let
>> me see about reopening 10835 and addressing it.
>>
>>
Hi Yan, I agree, it IS really confusing. Here is the technique for
transforming a column. It is very general because you can make "myConvert"
do whatever you want.
import org.apache.spark.mllib.linalg.Vectors
val df = Seq((0, "[1,3,5]"), (1, "[2,4,6]")).toDF
df.show()
// The columns were named
Hi,
I am trying to create an external table WITH partitions using SPARK.
Currently I am using catalog.createExternalTable(cleanTableName, "ORC",
schema, Map("path" -> s"s3://$location/"))
Does createExternalTable have options that can create a table with
partitions? I assume it would be a key-
Can you please post the line of code that is doing the df.write command?
On Tue, Sep 20, 2016 at 9:29 AM, Sankar Mittapally <
sankar.mittapa...@creditvidya.com> wrote:
> Hey Kevin,
>
> It is a empty directory, It is able to write part files to the directory
> but while merging those part files we
Please find the code below.
sankar2 <- read.df("/nfspartition/sankar/test/2016/08/test.json")
I tried these two commands.
write.df(sankar2,"/nfspartition/sankar/test/test.csv","csv",header="true")
saveDF(sankar2,"sankartest.csv",source="csv",mode="append",schema="true")
On Tue, Sep 20, 2016 a
Hi,
I've been struggling with the reliability, which on the face of it, should
be a fairly simple job: Given a number of events, group them by user, event
type and week in which they occurred and aggregate their counts.
The input data is fairly skewed so I do a repartition and add a salt to the
k
Hi Frank,
Which version of Spark are you using? Also can you share more information about
the exception.
If it’s not confidential, you can send the data sample to me
(yuhao.y...@intel.com) and I can try to investigate.
Regards,
Yuhao
From: Frank Zhang [mailto:dataminin...@yahoo.com.INVALID]
S
Instead of *mode="append"*, try *mode="overwrite"*
On Tue, Sep 20, 2016 at 11:30 AM, Sankar Mittapally <
sankar.mittapa...@creditvidya.com> wrote:
> Please find the code below.
>
> sankar2 <- read.df("/nfspartition/sankar/test/2016/08/test.json")
>
> I tried these two commands.
> write.df(sankar2
I used that one also
On Sep 20, 2016 10:44 PM, "Kevin Mellott" wrote:
> Instead of *mode="append"*, try *mode="overwrite"*
>
> On Tue, Sep 20, 2016 at 11:30 AM, Sankar Mittapally creditvidya.com> wrote:
>
>> Please find the code below.
>>
>> sankar2 <- read.df("/nfspartition/sankar/test/2016/08
A few options include:
https://github.com/marufaytekin/lsh-spark - I've used this a bit and it
seems quite scalable too from what I've looked at.
https://github.com/soundcloud/cosine-lsh-join-spark - not used this but
looks like it should do exactly what you need.
https://github.com/mrsqueeze/*spa
Hi Yuhao,
Thank you so much for your great contribution to the LDA and other Spark
modules!
I use both Spark 1.6.2 and 2.0.0. The data I used originally is very large
which has tens of millions of documents. But for test purpose, the data set I
mentioned earlier ("/data/mllib/sample_lda_d
Probably this: https://issues.apache.org/jira/browse/SPARK-13258
As described in the JIRA, workaround is to use SPARK_JAVA_OPTS
On Mon, Sep 19, 2016 at 5:07 PM, sagarcasual .
wrote:
> Hello,
> I have my Spark application running in cluster mode in CDH with
> extraJavaOptions.
> However when I a
Thanks Nick - those examples will help a ton!!
On Tue, Sep 20, 2016 at 12:20 PM, Nick Pentreath
wrote:
> A few options include:
>
> https://github.com/marufaytekin/lsh-spark - I've used this a bit and it
> seems quite scalable too from what I've looked at.
> https://github.com/soundcloud/cosine-
I'm using Spark 2.0.
I've created a dataset from a parquet file and repartition on one of the
columns (docId) and persist the repartitioned dataset.
val om = ds.repartition($"docId").persist(StorageLevel.MEMORY_AND_DISK)
When I try to confirm the partitioner, with
om.rdd.partitioner
I get
Op
Are you able to manually delete the folder below? I'm wondering if there is
some sort of non-Spark factor involved (permissions, etc).
/nfspartition/sankar/banking_l1_v2.csv
On Tue, Sep 20, 2016 at 12:19 PM, Sankar Mittapally <
sankar.mittapa...@creditvidya.com> wrote:
> I used that one also
>
>
Using the Soundcloud implementation of LSH, I was able to process a 22K
product dataset in a mere 65 seconds! Thanks so much for the help!
On Tue, Sep 20, 2016 at 1:15 PM, Kevin Mellott
wrote:
> Thanks Nick - those examples will help a ton!!
>
> On Tue, Sep 20, 2016 at 12:20 PM, Nick Pentreath
Related question: is there anything that does scalable matrix
multiplication on Spark? For example, we have that long list of vectors
and want to construct the similarity matrix: v * T(v). In R it would be: v
%*% t(v)
Thanks,
Pete
On Mon, Sep 19, 2016 at 3:49 PM, Kevin Mellott
wrote:
> Hi a
Yeah I can do all operations on that folder
On Sep 21, 2016 12:15 AM, "Kevin Mellott" wrote:
> Are you able to manually delete the folder below? I'm wondering if there
> is some sort of non-Spark factor involved (permissions, etc).
>
> /nfspartition/sankar/banking_l1_v2.csv
>
> On Tue, Sep 20, 2
Spark version plz ?
On 21 September 2016 at 09:46, Sankar Mittapally <
sankar.mittapa...@creditvidya.com> wrote:
> Yeah I can do all operations on that folder
>
> On Sep 21, 2016 12:15 AM, "Kevin Mellott"
> wrote:
>
>> Are you able to manually delete the folder below? I'm wondering if there
>> i
My code works fine with JSON input format (Spark 1.6 on Amazon EMR,
emr-5.0.0). I tried the Parquet format. Works fine for English data. When I
tried the Parquet format with some Japanese language text, I am getting this
weird stack-trace:
*Caused by: java.lang.ClassCastException: optional binary
Hello,
Please add a link in Spark Community page (
https://spark.apache.org/community.html)
To Israel Spark Meetup (https://www.meetup.com/israel-spark-users/)
We're an active meetup group, unifying the local Spark user community, and
having regular meetups.
Thanks!
Romi K.
34 matches
Mail list logo