Hi Pavan,
Refer to the ASF Source Header and Copyright Notice Policy[1], code
directly submitted to ASF should include the Apache license header
without any additional copyright notice.
Kent Yao
[1] https://www.apache.org/legal/src-headers.html#headers
Sean Owen 于2023年7月25日周二 07:22写道
Hi all,
The Apache Kyuubi (Incubating) community is pleased to announce that
Apache Kyuubi (Incubating) 1.5.0-incubating has been released!
Apache Kyuubi (Incubating) is a distributed multi-tenant JDBC server for
large-scale data processing and analytics, built on top of Apache Spark
and
tps://github.com/apache/spark/tags Bests,
Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC interf
Please refer to http://spark.apache.org/docs/latest/api/sql/index.html#version
Kent Yao @ Data Science Center, Hangzhou Research Institute
Hi Pankaj,Have you tried spark.sql.parquet.respectSummaryFiles=true?
Bests,
Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase
,
Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark.spark-authorizerA Spark
Congrats, all!
Bests,
Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark
unsubscribe
--
haitao.yao
Hi Riccardo,
Right now, Spark does not support low-latency predictions in Production.
MLeap is an alternative and it's been used in many scenarios. But it's good
to see that Spark Community has decided to provide such support.
On Wed, Jan 23, 2019 at 7:53 AM Riccardo Ferrari wrote:
> Felix,
anager"
> thread, but I don't see that one in your list.
>
> On Wed, Jan 16, 2019 at 12:08 PM Pola Yao wrote:
> >
> > Hi Marcelo,
> >
> > Thanks for your response.
> >
> > I have dumped the threads on the server where I submitted the spark
> applica
AM Marcelo Vanzin wrote:
> If System.exit() doesn't work, you may have a bigger problem
> somewhere. Check your threads (using e.g. jstack) to see what's going
> on.
>
> On Wed, Jan 16, 2019 at 8:09 AM Pola Yao wrote:
> >
> > Hi Marcelo,
> >
> > Thanks for
if
> something is creating a non-daemon thread that stays alive somewhere,
> you'll see that.
>
> Or you can force quit with sys.exit.
>
> On Tue, Jan 15, 2019 at 1:30 PM Pola Yao wrote:
> >
> > I submitted a Spark job through ./spark-submit command, the code was
> exe
I submitted a Spark job through ./spark-submit command, the code was
executed successfully, however, the application got stuck when trying to
quit spark.
My code snippet:
'''
{
val spark = SparkSession.builder.master(...).getOrCreate
val pool = Executors.newFixedThreadPool(3)
implicit val xc =
Hi Spark Comminuty,
I was using XGBoost-spark to train a machine learning model. The dataset
was not large (around 1G). And I used the following command to submit my
application:
'''
./bin/spark-submit --master yarn --deploy-mode client --num-executors 50
--executor-cores 2 --executor-memory 3g
Hello Spark Community,
I have a dataset of size 20G, 20 columns. Each column is categorical, so I
applied string-indexer and one-hot-encoding on every column. After, I
applied vector-assembler on all the newly derived columns to form a feature
vector for each record, and then feed the feature
Hi Comminuty,
I have a 1T dataset which contains records for 50 users. Each user has 20G
data averagely.
I wanted to use spark to train a machine learning model (e.g., XGBoost tree
model) for each user. Ideally, the result should be 50 models. However,
it'd be infeasible to submit 50 spark jobs
You are essential doing document clustering. K-means will do it. You do have to
specify the number of clusters up front.
Sent from Email+ secured by MobileIron
From: "Donni Khan"
>
Date:
an error exit code?
>
> You could set checkCode to True
> spark.apache.org/docs/latest/api/python/pyspark.html?
> highlight=pipe#pyspark.RDD.pipe
>
> Otherwise maybe you want to output the status into stdout so you could
> process it individually.
>
>
> _____
Hello Community,
I have the following Python code that calls an external command:
rdd.pipe('run.sh', env=os.environ).collect()
run.sh can either exit with status 1 or 0, how could I get the exit code
from Python? Thanks!
Xuchen
I have the similar observation with 1.4.1 where the 3rd stage running
mapPartitionsWithIndex at Word2Vec.scala:312 seems running with a single
thread (which takes forever for reasonable large corpus). Can anyone help
explain if this is an algorithm limitation or there model parameters can be
is problem with starting the shell in yarn-client mode. I am
working with HDP2.2.6 which runs Hadoop 2.6.
-Yao derby.log
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n25195/derby.log>
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-s
Thanks. I wonder why this is not widely reported in the user forum. The RELP
shell is basically broken in 1.5 .0 and 1.5.1
-Yao
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Sunday, October 25, 2015 12:01 PM
To: Ge, Yao (Y.)
Cc: user
Subject: Re: Spark scala REPL - Unable to create sqlContext
.
-Yao
Thank you for this suggestion! But may I ask what's the advantage to use
checkpoint instead of cache here? Cuz they both cut lineage. I only know
checkpoint saves RDD in disk, while cache in memory. So may be it's for
reliability?
Also on
I found the TF-IDF feature extraction and all the MLlib code that work with
pure Vector RDD very difficult to work with due to the lack of ability to
associate vector back to the original data. Why can't Spark MLlib support
LabeledPoint?
--
View this message in context:
Can you show how to do IDF transform on tfWithId? Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/TF-IDF-in-Spark-1-1-0-tp16389p20877.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
trainErr = labelAndPreds.filter(r = r._1 != r._2).count.toDouble /
data.count;
println(Training Error = + trainErr);
println(Learned classification tree model:\n + model);
-Yao
Can anyone provide an example code of using Categorical Features in Decision
Tree?
Thanks!
-Yao
Hey, guys. Here's my problem:
While using the standalone mode, I always use the following args for
executor:
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -verbose:gc
-Xloggc:/tmp/spark.executor.gc.log
But as we know, hotspot JVM does not support variable substitution on
-Xloggc parameter, which
Hi,
Amazon aws started to provide service for China mainland, the region
name is cn-north-1. But the script spark provides: spark_ec2.py will query
ami id from https://github.com/mesos/spark-ec2/tree/v4/ami-list and there's
no ami information for cn-north-1 region .
Can anybody update the
://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html
cn-north-1 is not a supported region for EC2, as far as I can tell. There
may be other AWS services that can use that region, but spark-ec2 relies on
EC2.
Nick
On Tue, Nov 4, 2014 at 8:09 PM, haitao .yao yao.e
/jira/secure/Dashboard.jspa to track this
request? I can do it if you've never opened a JIRA issue before.
Nick
On Tue, Nov 4, 2014 at 9:03 PM, haitao .yao yao.e...@gmail.com wrote:
I'm afraid not. We have been using EC2 instances in cn-north-1 region for
a while. And the latest version of boto
I am working with Spark 1.1.0 and I believe Timestamp is a supported data type
for Spark SQL. However I keep getting this MatchError for java.sql.Timestamp
when I try to use reflection to register a Java Bean with Timestamp field.
Anything wrong with my code below?
public
(RemoteTestRunner.java:197)
From: Wang, Daoyuan [mailto:daoyuan.w...@intel.com]
Sent: Sunday, October 19, 2014 10:31 AM
To: Ge, Yao (Y.); user@spark.apache.org
Subject: RE: scala.MatchError: class java.sql.Timestamp
Can you provide the exception stack?
Thanks,
Daoyuan
From: Ge, Yao (Y.) [mailto:y...@ford.com
I need help to better trap Exception in the map functions. What is the best way
to catch the exception and provide some helpful diagnostic information such as
source of the input such as file name (and ideally line number if I am
processing a text file)?
-Yao
much Sean!
-Yao
-Original Message-
From: Sean Owen [mailto:so...@cloudera.com]
Sent: Thursday, October 09, 2014 3:04 AM
To: Ge, Yao (Y.)
Cc: user@spark.apache.org
Subject: Re: Dedup
I think the question is about copying the argument. If it's an immutable value
like String, yes just
of the first
argument. Is there are better way to do dedup in Spark?
-Yao
array will need to be in ascending order.
In many cases, it probably easier to use other two forms of Vectors.sparse
functions if the indices and value positions are not naturally sorted.
-Yao
From: Ge, Yao (Y.)
Sent: Monday, August 11, 2014 11:44 PM
To: 'u...@spark.incubator.apache.org
)
at
org.apache.spark.mllib.clustering.KMeans$$anonfun$17.apply(KMeans.scala:267)
What does this means? How do I troubleshoot this problem?
Thanks.
-Yao
to the biggest eigenvalue
s.toArray(0)*s.toArray(0)?
xj @ Tokyo
On Fri, Aug 8, 2014 at 12:07 PM, Chunnan Yao yaochun...@gmail.com wrote:
Hi there, what you've suggested are all meaningful. But to make myself
clearer, my essential problems are:
1. My matrix is asymmetric
both eigenvalues and eigenvectors
or at
least the biggest eigenvalue and the corresponding eigenvector, it
seems
that current Spark doesn't have such API. Is it possible that I write
eigenvalue decomposition from scratch? What should I do? Thanks a
lot!
Miles Yao
Hi,
I used 1g memory for the driver java process and got OOM error on
driver side before reduceByKey. After analyzed the heap dump, the biggest
object is org.apache.spark.MapStatus, which occupied over 900MB memory.
Here's my question:
1. Is there any optimization switches that I can tune
reduceByKey(_ + _,
100) to use only 100 tasks).
Matei
On May 29, 2014, at 2:03 AM, haitao .yao yao.e...@gmail.com wrote:
Hi,
I used 1g memory for the driver java process and got OOM error on
driver side before reduceByKey. After analyzed the heap dump, the biggest
object
43 matches
Mail list logo