Congratulations Peter and Xidou.
On Sun, Aug 6, 2023, 7:05 PM Wenchen Fan wrote:
> Hi all,
>
> The Spark PMC recently voted to add two new committers. Please join me in
> welcoming them to their new role!
>
> - Peter Toth (Spark SQL)
> - Xiduo You (Spark SQL)
>
> They consistently make
Hi Vibhor,
We worked on a project to create lucene indexes using spark but the project
has not been managed for some time now. If there is interest we can
resurrect it
Congratulations Xinrong !
On Tue, Aug 9, 2022, 10:00 PM Rui Wang wrote:
> Congrats Xinrong!
>
>
> -Rui
>
> On Tue, Aug 9, 2022 at 8:57 PM Xingbo Jiang wrote:
>
>> Congratulations!
>>
>> Yuanjian Li 于2022年8月9日 周二20:31写道:
>>
>>> Congratulations, Xinrong!
>>>
>>> XiDuo You 于2022年8月9日 周二19:18写道:
Congratulations to the whole spark community ! It's a great achievement.
On Sat, May 14, 2022, 2:49 AM Yikun Jiang wrote:
> Awesome! Congrats to the whole community!
>
> On Fri, May 13, 2022 at 3:44 AM Matei Zaharia
> wrote:
>
>> Hi all,
>>
>> We recently found out that Apache Spark received
[
https://issues.apache.org/jira/browse/SPARK-24374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728020#comment-16728020
]
Debasish Das commented on SPARK-24374:
--
Hi [~mengxr] with barrier mode available is it not possible
Open source impl of dremel is parquet !
On Mon, Oct 29, 2018, 8:42 AM Gourav Sengupta
wrote:
> Hi,
>
> why not just use dremel?
>
> Regards,
> Gourav Sengupta
>
> On Mon, Oct 29, 2018 at 1:35 PM lchorbadjiev <
> lubomir.chorbadj...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm trying to reproduce the
[
https://issues.apache.org/jira/browse/BEAM-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507150#comment-16507150
]
Debasish Das edited comment on BEAM-3737 at 6/9/18 8:21 PM:
I saw
[
https://issues.apache.org/jira/browse/BEAM-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507150#comment-16507150
]
Debasish Das commented on BEAM-3737:
I saw this is being mentioned in TFMA...I am also not clear why
[
https://issues.apache.org/jira/browse/BEAM-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407543#comment-16407543
]
Debasish Das commented on BEAM-2810:
[~chamikara] did you try fastavro and pyavroc as well ? both
[
https://issues.apache.org/jira/browse/BEAM-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407540#comment-16407540
]
Debasish Das commented on BEAM-1442:
Thanks [~robertwb]...I will look into BEAM-2810 if we can fix
[
https://issues.apache.org/jira/browse/BEAM-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407538#comment-16407538
]
Debasish Das commented on BEAM-2810:
I will try reading bq from beam directly but during iterative
[
https://issues.apache.org/jira/browse/BEAM-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407532#comment-16407532
]
Debasish Das commented on BEAM-2810:
our flow starts from bq-export/gcs avro files and 1 node sizable
[
https://issues.apache.org/jira/browse/BEAM-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407241#comment-16407241
]
Debasish Das commented on BEAM-1442:
Hi...I am pushing 10MB avro files on local and idea is to push
I have written spark lucene integration as part of Verizon trapezium/dal
project...you can extract the data stored in hdfs indices and feed it to
spark...
https://github.com/Verizon/trapezium/tree/master/dal/src/test/scala/com/verizon/bda/trapezium/dal
I intend to publish it as spark package as
Hi,
ECOS is a solver for second order conic programs and we showed the Spark
integration at 2014 Spark Summit
https://spark-summit.org/2014/quadratic-programing-solver-for-non-negative-matrix-factorization/.
Right now the examples show how to reformulate matrix factorization as a
SOCP and solve
Hi,
ECOS is a solver for second order conic programs and we showed the Spark
integration at 2014 Spark Summit
https://spark-summit.org/2014/quadratic-programing-solver-for-non-negative-matrix-factorization/.
Right now the examples show how to reformulate matrix factorization as a
SOCP and solve
If you can point me to previous benchmarks that are done, I would like to
use smoothing and see if the LBFGS convergence improved while not impacting
linear svc loss.
Thanks.
Deb
On Dec 16, 2017 7:48 PM, "Debasish Das" <debasish.da...@gmail.com> wrote:
Hi Weichen,
Traditionall
hould be considered
> carefully.
> Is there any literature that proves changing max to soft-max can behave
> well?
> I’m more than happy to see some benchmarks if you can have.
>
> + Yuhao, who did similar effort in this PR: https://github.com/apache/
> spark/pull/17862
>
> Rega
Hi,
I looked into the LinearSVC flow and found the gradient for hinge as
follows:
Our loss function with {0, 1} labels is max(0, 1 - (2y - 1) (f_w(x)))
Therefore the gradient is -(2y - 1)*x
max is a non-smooth function.
Did we try using ReLu/Softmax function and use that to smooth the hinge
+1
Is there any design doc related to API/internal changes ? Will CP be the
default in structured streaming or it's a mode in conjunction with
exisiting behavior.
Thanks.
Deb
On Nov 1, 2017 8:37 AM, "Reynold Xin" wrote:
Earlier I sent out a discussion thread for CP in
You can run l
On May 15, 2017 3:29 PM, "Nipun Arora" wrote:
> Thanks all for your response. I will have a look at them.
>
> Nipun
>
> On Sat, May 13, 2017 at 2:38 AM vincent gromakowski <
> vincent.gromakow...@gmail.com> wrote:
>
>> It's in scala but it should be
If it is 7m rows and 700k features (or say 1m features) brute force row
similarity will run fine as well...check out spark-4823...you can compare
quality with approximate variant...
On Feb 9, 2017 2:55 AM, "nguyen duc Tuan" wrote:
> Hi everyone,
> Since spark 2.1.0
y to call predict on single vector.
> There is no API exposed. It is WIP but not yet released.
>
> On Sat, Feb 4, 2017 at 11:07 PM, Debasish Das <debasish.da...@gmail.com>
> wrote:
>
>> If we expose an API to access the raw models out of PipelineModel can't
>> we call predict direc
, graph and kernel models we use a lot and for them turned out that
mllib style model predict were useful if we change the underlying store...
On Feb 4, 2017 9:37 AM, "Debasish Das" <debasish.da...@gmail.com> wrote:
> If we expose an API to access the raw models out of PipelineMo
res to score through spark.ml.Model
>predict API". The predict API is in the old mllib package not the new ml
>package.
>- "why r we using dataframe and not the ML model directly from API" -
>Because as of now the new ml package does not have the direct API.
I am not sure why I will use pipeline to do scoring...idea is to build a
model, use model ser/deser feature to put it in the row or column store of
choice and provide a api access to the model...we support these primitives
in github.com/Verizon/trapezium...the api has access to spark context in
You may want to pull up release/1.2 branch and 1.2.0 tag to build it
yourself incase the packages are not available.
On Jan 15, 2017 2:55 PM, "Md. Rezaul Karim"
wrote:
> Hi Ayan,
>
> Thanks a million.
>
> Regards,
> _
> *Md. Rezaul
[
https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15809876#comment-15809876
]
Debasish Das commented on SPARK-10078:
--
I looked into the code and I see we are replicating Breeze
[
https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15793770#comment-15793770
]
Debasish Das commented on SPARK-10078:
--
[~mengxr] [~dlwh] is it possible to implement VL-BFGS
[
https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15793760#comment-15793760
]
Debasish Das edited comment on SPARK-10078 at 1/3/17 12:26 AM:
---
Ideally
[
https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15793760#comment-15793760
]
Debasish Das commented on SPARK-10078:
--
Ideally feature partitioning should be automatically tuned
[
https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15777650#comment-15777650
]
Debasish Das edited comment on SPARK-13857 at 12/26/16 5:57 AM:
item
[
https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15777650#comment-15777650
]
Debasish Das commented on SPARK-13857:
--
item->item and user->user was done in an old PR
Hi,
I need to add col1:Array[String], col2:Array[Int] and col3:Array[Float] to
docvalue.
col1: Array[String] sparse dimension from OLAP world
col2: Array[Int] + Array[Float] represents a sparse vector for sparse
measure from OLAP world with dictionary encoding for col1 mapped to col2
I have
[
https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581366#comment-15581366
]
Debasish Das commented on SPARK-5992:
-
Also do you have hash function for euclidean distance? We use
[
https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581361#comment-15581361
]
Debasish Das commented on SPARK-5992:
-
Did you compare with brute force knn ? Normally lsh does
[
https://issues.apache.org/jira/browse/SPARK-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581359#comment-15581359
]
Debasish Das commented on SPARK-4823:
-
We use it in multiple usecases internally but did not get time
Thanks Cody for bringing up a valid point...I picked up Spark in 2014 as
soon as I looked into it since compared to writing Java map-reduce and
Cascading code, Spark made writing distributed code fun...But now as we
went deeper with Spark and real-time streaming use-case gets more
prominent, I
[
https://issues.apache.org/jira/browse/SPARK-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411023#comment-15411023
]
Debasish Das commented on SPARK-6932:
-
[~rxin] [~sowen] Do we have any other active parameter server
> gives an idea. Is it possible to make this more efficient? I don't want to
>> use probabilistic functions, and I will cache the matrix because many
>> distances are looked up at the matrix, computing them on demand would
>> require far more computations.
>>
>>
[
https://issues.apache.org/jira/browse/SPARK-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315935#comment-15315935
]
Debasish Das edited comment on SPARK-9834 at 6/5/16 4:49 PM:
-
Do you have
[
https://issues.apache.org/jira/browse/SPARK-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315935#comment-15315935
]
Debasish Das commented on SPARK-9834:
-
Do you have runtime comparisons that when features <= 4096,
Simultaneous action works on cluster fine if they are independent...on
local I never paid attention but the code path should be similar...
On Jan 18, 2016 8:00 AM, "Koert Kuipers" wrote:
> stacktrace? details?
>
> On Mon, Jan 18, 2016 at 5:58 AM, Mennour Rostom
Decoupling mlllib and core is difficult...it is not intended to run spark
core 1.5 with spark mllib 1.6 snapshot...core is more stabilized due to new
algorithms getting added to mllib and sometimes you might be tempted to do
that but its not recommend.
On Nov 21, 2015 8:04 PM, "Reynold Xin"
to add. You can add an issue in
breeze for the enhancememt.
Alternatively you can use breeze lpsolver as well that uses simplex from
apache math.
On Nov 4, 2015 1:05 AM, "Zhiliang Zhu" <zchl.j...@yahoo.com> wrote:
> Hi Debasish Das,
>
> Firstly I must show my deep appreciat
>
> On Mon, Nov 2, 2015 at 6:03 PM, Debasish Das <debasish.da...@gmail.com>
> wrote:
> > Use breeze simplex which inturn uses apache maths simplex...if you want
> to
> > use interior point method you can use ecos
> > https://github.com/embotech/ecos-java-scala ...
Use breeze simplex which inturn uses apache maths simplex...if you want to
use interior point method you can use ecos
https://github.com/embotech/ecos-java-scala ...spark summit 2014 talk on
quadratic solver in matrix factorization will show you example integration
with spark. ecos runs as jni
You can run 2 threads in driver and spark will fifo schedule the 2 jobs on
the same spark context you created (executors and cores)...same idea is
used for spark sql thriftserver flow...
For streaming i think it lets you run only one stream at a time even if you
run them on multiple threads on
Rdd nesting can lead to recursive nesting...i would like to know the
usecase and why join can't support it...you can always expose an api over a
rdd and access that in another rdd mappartition...use a external data
source like hbase cassandra redis to support the api...
For ur case group by and
[
https://issues.apache.org/jira/browse/SPARK-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735706#comment-14735706
]
Debasish Das commented on SPARK-10408:
--
[~avulanov] In MLP can we change BFGS to OWLQN and get L1
[
https://issues.apache.org/jira/browse/SPARK-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734170#comment-14734170
]
Debasish Das edited comment on SPARK-9834 at 9/8/15 3:18 PM:
-
[~mengxr] If you
[
https://issues.apache.org/jira/browse/SPARK-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734170#comment-14734170
]
Debasish Das commented on SPARK-9834:
-
If you are open to use breeze.proximal.QuadraticMinimizer we
Not sure dropout but if you change the solver from breeze bfgs to breeze
owlqn or breeze.proximal.NonlinearMinimizer you can solve ann loss with l1
regularization which will yield elastic net style sparse solutionsusing
that you can clean up edges which has 0.0 as weight...
On Sep 7, 2015 7:35
[
https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734130#comment-14734130
]
Debasish Das commented on SPARK-10078:
--
[~mengxr] will it be Breeze LBFGS modification or part
[
https://issues.apache.org/jira/browse/SPARK-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Debasish Das updated SPARK-4823:
Attachment: SparkMeetup2015-Experiments2.pdf
SparkMeetup2015-Experiments1.pdf
[
https://issues.apache.org/jira/browse/SPARK-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648340#comment-14648340
]
Debasish Das commented on SPARK-4823:
-
We did more detailed experiment for July 2015
, the access path is as follows:
Spark SQL JDBC Interface - Spark SQL Parser/Analyzer/Optimizer-Astro
Optimizer- HBase Scans/Gets - … - HBase Region server
Regards,
Yan
*From:* Debasish Das [mailto:debasish.da...@gmail.com]
*Sent:* Monday, July 27, 2015 10:02 PM
*To:* Yan Zhou.sc
, the access path is as follows:
Spark SQL JDBC Interface - Spark SQL Parser/Analyzer/Optimizer-Astro
Optimizer- HBase Scans/Gets - … - HBase Region server
Regards,
Yan
*From:* Debasish Das [mailto:debasish.da...@gmail.com]
*Sent:* Monday, July 27, 2015 10:02 PM
*To:* Yan Zhou.sc
Hi Yan,
Is it possible to access the hbase table through spark sql jdbc layer ?
Thanks.
Deb
On Jul 22, 2015 9:03 PM, Yan Zhou.sc yan.zhou...@huawei.com wrote:
Yes, but not all SQL-standard insert variants .
*From:* Debasish Das [mailto:debasish.da...@gmail.com]
*Sent:* Wednesday, July 22
Hi Yan,
Is it possible to access the hbase table through spark sql jdbc layer ?
Thanks.
Deb
On Jul 22, 2015 9:03 PM, Yan Zhou.sc yan.zhou...@huawei.com wrote:
Yes, but not all SQL-standard insert variants .
*From:* Debasish Das [mailto:debasish.da...@gmail.com]
*Sent:* Wednesday, July 22
AM, Debasish Das debasish.da...@gmail.com
wrote:
Yeah, I think the idea of confidence is a bit different than what I am
looking for using implicit factorization to do document clustering.
I basically need (r_ij - w_ih_j)^2 for all observed ratings and (0 -
w_ih_j)^2 for all the unobserved
I will think further but in the current implicit formulation with
confidence, looks like I am factorizing a 0/1 matrix with weights 1 +
alpha*rating for observed (1) values and 1 for unobserved (0) values. It's
a bit different from LSA model.
On Sun, Jul 26, 2015 at 6:45 AM, Debasish Das
heavily skewed to pay attention to the
high-count instances.
On Sun, Jul 26, 2015 at 9:19 AM, Debasish Das debasish.da...@gmail.com
wrote:
Yeah, I think the idea of confidence is a bit different than what I am
looking for using implicit factorization to do document clustering.
I
Hi,
Implicit factorization is important for us since it drives recommendation
when modeling user click/no-click and also topic modeling to handle 0
counts in document x word matrices through NMF and Sparse Coding.
I am a bit confused on this code:
val c1 = alpha * math.abs(rating)
if (rating
Does it also support insert operations ?
On Jul 22, 2015 4:53 PM, Bing Xiao (Bing) bing.x...@huawei.com wrote:
We are happy to announce the availability of the Spark SQL on HBase
1.0.0 release.
http://spark-packages.org/package/Huawei-Spark/Spark-SQL-on-HBase
The main features in this
Does it also support insert operations ?
On Jul 22, 2015 4:53 PM, Bing Xiao (Bing) bing.x...@huawei.com wrote:
We are happy to announce the availability of the Spark SQL on HBase
1.0.0 release.
http://spark-packages.org/package/Huawei-Spark/Spark-SQL-on-HBase
The main features in this
Hi,
First of all congratulations on the release of akka-streams and akka-http !
I am writing a service and spray was my initial choice but with akka-http
and spray merge I am more inclined to start learning and using akka-http.
This service needs to manage a SparkContext and most likely
What do you need in sparkR that mllib / ml don't havemost of the basic
analysis that you need on stream can be done through mllib components...
On Jul 13, 2015 2:35 PM, Feynman Liang fli...@databricks.com wrote:
Sorry; I think I may have used poor wording. SparkR will let you use R to
How do you manage the spark context elastically when your load grows from
1000 users to 1 users ?
On Tue, Jul 14, 2015 at 8:31 AM, Hafsa Asif hafsa.a...@matchinguu.com
wrote:
I have almost the same case. I will tell you what I am actually doing, if
it
is according to your requirement,
how far
it can be pushed.
Thanks for your help!
-- Eric
On Tue, Jun 30, 2015 at 5:28 PM, Debasish Das debasish.da...@gmail.com
wrote:
I got good runtime improvement from hive partitioninp, caching the
dataset and increasing the cores through repartition...I think for your
case
I got good runtime improvement from hive partitioninp, caching the dataset
and increasing the cores through repartition...I think for your case
generating mysql style indexing will help further..it is not supported in
spark sql yet...
I know the dataset might be too big for 1 node mysql but do
Hi,
Akka cluster uses gossip protocol for Master election. The approach in
Spark right now is to use Zookeeper for high availability.
Interestingly Cassandra and Redis clusters are both using Gossip protocol.
I am not sure what is the default behavior right now. If the master dies
and zookeeper
Model sizes are 10m x rank, 100k x rank range.
For recommendation/topic modeling I can run batch recommendAll and then
keep serving the model using a distributed cache but then I can't
incorporate per user model re-predict if user feedback is making the
current topk stale. I have to wait for next
and reload factors from S3 periodically.
We then use Elasticsearch to post-filter results and blend content-based
stuff - which I think might be more efficient than SparkSQL for this
particular purpose.
On Wed, Jun 24, 2015 at 8:59 AM, Debasish Das debasish.da...@gmail.com
wrote:
Model sizes
, 2015 at 12:21 AM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
I have some impala created parquet tables which hive 0.13.2 can read fine.
Now the same table when I want to read using Spark SQL 1.3 I am getting
exception class exception that parquet.hive.serde.ParquetHiveSerde not
found
engine probably doesn't matter at all in comparison.
On Sat, Jun 20, 2015, 9:40 PM Debasish Das debasish.da...@gmail.com wrote:
After getting used to Scala, writing Java is too much work :-)
I am looking for scala based project that's using netty at its core (spray
is one example
Hi,
I have some impala created parquet tables which hive 0.13.2 can read fine.
Now the same table when I want to read using Spark SQL 1.3 I am getting
exception class exception that parquet.hive.serde.ParquetHiveSerde not
found.
I am assuming that hive somewhere is putting the
Hi,
The demo of end-to-end ML pipeline including the model server component at
Spark Summit was really cool.
I was wondering if the Model Server component is based upon Velox or it
uses a completely different architecture.
https://github.com/amplab/velox-modelserver
We are looking for an open
Congratulations to All.
DB great work in bringing quasi newton methods to Spark !
On Wed, Jun 17, 2015 at 3:18 PM, Chester Chen ches...@alpinenow.com wrote:
Congratulations to All.
DB and Sandy, great works !
On Wed, Jun 17, 2015 at 3:12 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Hi,
The demo of end-to-end ML pipeline including the model server component at
Spark Summit was really cool.
I was wondering if the Model Server component is based upon Velox or it
uses a completely different architecture.
https://github.com/amplab/velox-modelserver
We are looking for an open
Integration of model server with ML pipeline API.
On Sat, Jun 20, 2015 at 12:25 PM, Donald Szeto don...@prediction.io wrote:
Mind if I ask what 1.3/1.4 ML features that you are looking for?
On Saturday, June 20, 2015, Debasish Das debasish.da...@gmail.com wrote:
After getting used to Scala
charles.ce...@gmail.com
wrote:
Is velox NOT open source?
On Saturday, June 20, 2015, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
The demo of end-to-end ML pipeline including the model server component
at Spark Summit was really cool.
I was wondering if the Model Server component is based
Also not sure how threading helps here because Spark puts a partition to
each core. On each core may be there are multiple threads if you are using
intel hyperthreading but I will let Spark handle the threading.
On Thu, Jun 18, 2015 at 8:38 AM, Debasish Das debasish.da...@gmail.com
wrote:
We
We added SPARK-3066 for this. In 1.4 you should get the code to do BLAS
dgemm based calculation.
On Thu, Jun 18, 2015 at 8:20 AM, Ayman Farahat
ayman.fara...@yahoo.com.invalid wrote:
Thanks Sabarish and Nick
Would you happen to have some code snippets that you can share.
Best
Ayman
On Jun
Also in my experiments, it's much faster to blocked BLAS through cartesian
rather than doing sc.union. Here are the details on the experiments:
https://issues.apache.org/jira/browse/SPARK-4823
On Thu, Jun 18, 2015 at 8:40 AM, Debasish Das debasish.da...@gmail.com
wrote:
Also not sure how
Running l1 and picking non zero coefficient s gives a good estimate of
interesting features as well...
On Jun 17, 2015 4:51 PM, Xiangrui Meng men...@gmail.com wrote:
We don't have it in MLlib. The closest would be the ChiSqSelector,
which works for categorical data. -Xiangrui
On Thu, Jun 11,
[
https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14583886#comment-14583886
]
Debasish Das commented on SPARK-2336:
-
Very cool idea Sen. Did you also look
[
https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14583886#comment-14583886
]
Debasish Das edited comment on SPARK-2336 at 6/12/15 6:51 PM
It's always better to use a quasi newton solver if the runtime and problem
scale permits as there are guarantees on opti mization...owlqn and bfgs are
both quasi newton
Most single node code bases will run quasi newton solvesif you are
using sgd better is to use adadelta/adagrad or similar
What is decision list ? Inorder traversal (or some other traversal) of
fitted decision tree
On Jun 5, 2015 1:21 AM, Sateesh Kavuri sateesh.kav...@gmail.com wrote:
Is there an existing way in SparkML to convert a decision tree to a
decision list?
On Thu, Jun 4, 2015 at 10:50 PM, Reza Zadeh
Hi,
We want to keep the model created and loaded in memory through Spark batch
context since blocked matrix operations are required to optimize on runtime.
The data is streamed in through Kafka / raw sockets and Spark Streaming
Context. We want to run some prediction operations with the
[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Debasish Das updated SPARK-6323:
Affects Version/s: (was: 1.4.0)
Large rank matrix factorization with Nonlinear loss
Wendell pwend...@gmail.com wrote:
Yes - spark packages can include non ASF licenses.
On Sat, May 23, 2015 at 6:16 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
Is it possible to add GPL/LGPL code on spark packages or it must be
licensed
under Apache as well ?
I want to expose
Yu yuzhih...@gmail.com wrote:
Pardon me.
Please use '8192k'
Cheers
On Sat, May 23, 2015 at 6:24 PM, Debasish Das debasish.da...@gmail.com
wrote:
Tried 8mb...still I am failing on the same error...
On Sat, May 23, 2015 at 6:10 PM, Ted Yu yuzhih...@gmail.com wrote:
bq. it shuld be 8mb
Hi,
I am on last week's master but all the examples that set up the following
.set(spark.kryoserializer.buffer, 8m)
are failing with the following error:
Exception in thread main java.lang.IllegalArgumentException:
spark.kryoserializer.buffer must be less than 2048 mb, got: + 8192 mb.
looks
Hi,
Is it possible to add GPL/LGPL code on spark packages or it must be
licensed under Apache as well ?
I want to expose Professor Tim Davis's LGPL library for sparse algebra and
ECOS GPL library through the package.
Thanks.
Deb
Tried 8mb...still I am failing on the same error...
On Sat, May 23, 2015 at 6:10 PM, Ted Yu yuzhih...@gmail.com wrote:
bq. it shuld be 8mb
Please use the above syntax.
Cheers
On Sat, May 23, 2015 at 6:04 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
I am on last week's master
[
https://issues.apache.org/jira/browse/SPARK-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Debasish Das updated SPARK-4823:
Attachment: MovieLensSimilarity Comparisons.pdf
The attached file shows the runtime comparison
[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557416#comment-14557416
]
Debasish Das commented on SPARK-2426:
-
[~mengxr] Should I add the PR to spark packages
Hi,
What was the motivation to write power iteration clustering using graphx
and not a vector matrix multiplication over similarity matrix represented
as say coordinate matrix ?
We can use gemv in that flow to block the computation.
Over graphx can we do all k eigen vector computation together
1 - 100 of 481 matches
Mail list logo