pleasant and safe
(due to possible train-serve skews) than it can be. Internally, the lack of
this feature has caused debates about how appropriate Spark really is for
production ML.
Asher Krim
Senior Software Engineer
On Thu, May 18, 2017 at 4:24 AM, Cristian Opris <cristian.b.op...@gmail.com>
Hey Michael,
any update on this? We're itching for a 2.1.1 release (specifically
SPARK-14804 which is currently blocking us)
Thanks,
Asher Krim
Senior Software Engineer
On Wed, Mar 22, 2017 at 7:44 PM, Michael Armbrust <mich...@databricks.com>
wrote:
> An update: I cut the tag for
at was used to train
I think this is one of those things that could live outside the project,
because it's more not-Spark than Spark. Remember too that building a
solution into the project blesses one at the expense of others.
Asher Krim
Senior Software Engineer
On Mon, Mar 13, 2017 at 11:08 A
ussion on.
Thanks,
Asher Krim
Senior Software Engineer
Congrats!
Asher Krim
Senior Software Engineer
On Mon, Feb 13, 2017 at 6:24 PM, Kousuke Saruta <saru...@oss.nttdata.co.jp>
wrote:
> Congratulations, Takuya!
>
> - Kousuke
> On 2017/02/14 7:38, Herman van Hövell tot Westerflier wrote:
>
> Congrats Takuya!
>
> On
It took me a while, but I finally got around this:
https://github.com/apache/spark/pull/16811/files
On Fri, Jan 6, 2017 at 4:03 AM, Asher Krim <ak...@hubspot.com> wrote:
> Felix - I'm not sure I understand your example about pipeline models,
> could you elaborate? I'm t
loss functions, etc.
>
> *(2) Consistent improvements to core algorithms*
> A less exciting but still very important item will be constantly improving
> the core set of algorithms in MLlib. This could mean speed, scaling,
> robustness, and usability for the few algorithms which cover 90% of use
> cases.
>
> There are plenty of other possibilities, and it will be great to hear the
> community's thoughts!
>
> Thanks,
> Joseph
>
>
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] <http://databricks.com/>
>
>
>
> - To
> unsubscribe e-mail: dev-unsubscr...@spark.apache.org
--
Asher Krim
Senior Software Engineer
is supposed to be based on RDDs. This makes these algorithms
unusable for anything larger than toy examples in < Spark 2.
If anyone is familiar with this bug, I would really appreciate it if they
could point me in the direction of the pr that fixed it.
Is a 1.6.4 release planned?
Would be possible to
n't cause an API compatibility problem with respect to Java 8
>> lambdas, but if that's settled, I think this could be fixed without
>> breaking the API.
>>
>> On Wed, Jan 18, 2017 at 8:50 PM Asher Krim <ak...@hubspot.com> wrote:
>>
>> In Spark 2 + Jav
using these
constructs correctly? Is there a workaround other than converting the
iterator to an iterable outside of the function?
Thanks,
--
Asher Krim
Senior Software Engineer
> not crazy models. This model could probably easily be serialized as
> individual vectors in this case. It would introduce a
> backwards-compatibility issue but it's possible to read old and new
> formats, I believe.
>
> On Fri, Jan 13, 2017 at 8:16 PM Asher Krim <ak...@hubspot.co
de that serializes models, which are quite small.
>> For example a PCA model consists of a few principal component vector. It's
>> a Dataset of just one element being saved here. It's re-using the code path
>> normally used to save big data sets, to output 1 file with 1 thing a
> n files.
>
> On Fri, Jan 13, 2017 at 5:23 PM Asher Krim <ak...@hubspot.com> wrote:
>
>> Hi,
>>
>> I'm curious why it's common for data to be repartitioned to 1 partition
>> when saving ml models:
>>
>> sqlContext.createDataFrame(Seq(data)).reparti
Am I missing some benefit of repartitioning like this?
Thanks,
--
Asher Krim
Senior Software Engineer
new method instead of changing the return type of
> the existing one.
>
>
> _____
> From: Asher Krim <ak...@hubspot.com>
> Sent: Wednesday, December 28, 2016 11:52 AM
> Subject: ml word2vec finSynonyms return type
> To: <dev@spark.apache.org>
, so here we are.)
Thanks,
--
Asher Krim
Senior Software Engineer
proposed fix? Would be good to know whether it fixes the
>>> issue.
>>>
>>> On Thu, Sep 22, 2016 at 2:49 PM, Asher Krim <ak...@hubspot.com> wrote:
>>>
>>>> Does anyone know what the status of SPARK-15717 is? It's a simple
>>>> enough
Does anyone know what the status of SPARK-15717 is? It's a simple enough
looking PR, but there has been no activity on it since June 16th.
I believe that we are hitting that bug with checkpointed distributed LDA.
It's a blocker for us and we would really appreciate getting it fixed.
Jira:
18 matches
Mail list logo