Re: Time to cut an Apache 2.4.1 release?

2019-02-12 Thread Dongjin Lee
> SPARK-23539 is a non-trivial improvement, so probably would not be
back-ported to 2.4.x.

Got it. It seems reasonable.

Committers:

Please don't omit SPARK-23539 from 2.5.0. Kafka community needs this
feature.

Thanks,
Dongjin

On Tue, Feb 12, 2019 at 1:50 PM Takeshi Yamamuro 
wrote:

> +1, too.
> branch-2.4 accumulates too many commits..:
>
> https://github.com/apache/spark/compare/0a4c03f7d084f1d2aa48673b99f3b9496893ce8d...af3c7111efd22907976fc8bbd7810fe3cfd92092
>
> On Tue, Feb 12, 2019 at 12:36 PM Dongjoon Hyun 
> wrote:
>
>> Thank you, DB.
>>
>> +1, Yes. It's time for preparing 2.4.1 release.
>>
>> Bests,
>> Dongjoon.
>>
>> On 2019/02/12 03:16:05, Sean Owen  wrote:
>> > I support a 2.4.1 release now, yes.
>> >
>> > SPARK-23539 is a non-trivial improvement, so probably would not be
>> > back-ported to 2.4.x.SPARK-26154 does look like a bug whose fix could
>> > be back-ported, but that's a big change. I wouldn't hold up 2.4.1 for
>> > it, but it could go in if otherwise ready.
>> >
>> >
>> > On Mon, Feb 11, 2019 at 5:20 PM Dongjin Lee  wrote:
>> > >
>> > > Hi DB,
>> > >
>> > > Could you add SPARK-23539[^1] into 2.4.1? I opened the PR[^2] a
>> little bit ago, but it has not included in 2.3.0 nor get enough review.
>> > >
>> > > Thanks,
>> > > Dongjin
>> > >
>> > > [^1]: https://issues.apache.org/jira/browse/SPARK-23539
>> > > [^2]: https://github.com/apache/spark/pull/22282
>> > >
>> > > On Tue, Feb 12, 2019 at 6:28 AM Jungtaek Lim 
>> wrote:
>> > >>
>> > >> Given SPARK-26154 [1] is a correctness issue and PR [2] is
>> submitted, I hope it can be reviewed and included within Spark 2.4.1 -
>> otherwise it will be a long-live correctness issue.
>> > >>
>> > >> Thanks,
>> > >> Jungtaek Lim (HeartSaVioR)
>> > >>
>> > >> 1. https://issues.apache.org/jira/browse/SPARK-26154
>> > >> 2. https://github.com/apache/spark/pull/23634
>> > >>
>> > >>
>> > >> 2019년 2월 12일 (화) 오전 6:17, DB Tsai 님이 작성:
>> > >>>
>> > >>> Hello all,
>> > >>>
>> > >>> I am preparing to cut a new Apache 2.4.1 release as there are many
>> bugs and correctness issues fixed in branch-2.4.
>> > >>>
>> > >>> The list of addressed issues are
>> https://issues.apache.org/jira/browse/SPARK-26583?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.4.1%20order%20by%20updated%20DESC
>> > >>>
>> > >>> Let me know if you have any concern or any PR you would like to get
>> in.
>> > >>>
>> > >>> Thanks!
>> > >>>
>> > >>>
>> -
>> > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> > >>>
>> > >
>> > >
>> > > --
>> > > Dongjin Lee
>> > >
>> > > A hitchhiker in the mathematical world.
>> > >
>> > > github: github.com/dongjinleekr
>> > > linkedin: kr.linkedin.com/in/dongjinleekr
>> > > speakerdeck: speakerdeck.com/dongjin
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
> --
> ---
> Takeshi Yamamuro
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*


Re: Time to cut an Apache 2.4.1 release?

2019-02-11 Thread Dongjin Lee
Hi DB,

Could you add SPARK-23539[^1] into 2.4.1? I opened the PR[^2] a little bit
ago, but it has not included in 2.3.0 nor get enough review.

Thanks,
Dongjin

[^1]: https://issues.apache.org/jira/browse/SPARK-23539
[^2]: https://github.com/apache/spark/pull/22282

On Tue, Feb 12, 2019 at 6:28 AM Jungtaek Lim  wrote:

> Given SPARK-26154 [1] is a correctness issue and PR [2] is submitted, I
> hope it can be reviewed and included within Spark 2.4.1 - otherwise it will
> be a long-live correctness issue.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 1. https://issues.apache.org/jira/browse/SPARK-26154
> 2. https://github.com/apache/spark/pull/23634
>
>
> 2019년 2월 12일 (화) 오전 6:17, DB Tsai 님이 작성:
>
>> Hello all,
>>
>> I am preparing to cut a new Apache 2.4.1 release as there are many bugs
>> and correctness issues fixed in branch-2.4.
>>
>> The list of addressed issues are
>> https://issues.apache.org/jira/browse/SPARK-26583?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.4.1%20order%20by%20updated%20DESC
>>
>> Let me know if you have any concern or any PR you would like to get in.
>>
>> Thanks!
>>
>> ---------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*


Re: Ask for reviewing on Structured Streaming PRs

2018-12-12 Thread Dongjin Lee
If it is possible, could you review my PR on Kafka's header
functionality[^1] also? It was added in Kafka 0.11.0.0 but still not
supported in Spark.

Thanks,
Dongjin

[^1]: https://github.com/apache/spark/pull/22282
[^2]: https://issues.apache.org/jira/browse/KAFKA-4208

On Wed, Dec 12, 2018 at 6:43 PM Jungtaek Lim  wrote:

> Hi devs,
>
> Would I kindly ask for reviewing on PRs for Structured Streaming? I have 5
> open pull requests on SS side [1] (earliest PR was opened around 4 months
> so far), and there looks like couple of PR for others [2] which looks good
> to be reviewed, too.
>
> Thanks in advance,
> Jungtaek Lim (HeartSaVioR)
>
> 1.
> https://github.com/apache/spark/pulls?utf8=%E2%9C%93=is%3Aopen+is%3Apr+author%3AHeartSaVioR+%5BSS%5D
> 2.
> https://github.com/apache/spark/pulls?utf8=%E2%9C%93=is%3Aopen+is%3Apr+%5BSS%5D+
>
>

-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*


Re: welcome a new batch of committers

2018-10-03 Thread Dongjin Lee
Congratulations to ALL!!

- Dongjin

On Wed, Oct 3, 2018 at 7:48 PM Jack Kolokasis 
wrote:

> Congratulations to all !!
>
> -Iacovos
>
> On 03/10/2018 12:54 μμ, Ted Yu wrote:
>
> Congratulations to all !
>
>  Original message 
> From: Jungtaek Lim  
> Date: 10/3/18 2:41 AM (GMT-08:00)
> To: Marco Gaido  
> Cc: dev  
> Subject: Re: welcome a new batch of committers
>
> Congrats all! You all deserved it.
> On Wed, 3 Oct 2018 at 6:35 PM Marco Gaido  wrote:
>
>> Congrats you all!
>>
>> Il giorno mer 3 ott 2018 alle ore 11:29 Liang-Chi Hsieh 
>> ha scritto:
>>
>>>
>>> Congratulations to all new committers!
>>>
>>>
>>> rxin wrote
>>> > Hi all,
>>> >
>>> > The Apache Spark PMC has recently voted to add several new committers
>>> to
>>> > the project, for their contributions:
>>> >
>>> > - Shane Knapp (contributor to infra)
>>> > - Dongjoon Hyun (contributor to ORC support and other parts of Spark)
>>> > - Kazuaki Ishizaki (contributor to Spark SQL)
>>> > - Xingbo Jiang (contributor to Spark Core and SQL)
>>> > - Yinan Li (contributor to Spark on Kubernetes)
>>> > - Takeshi Yamamuro (contributor to Spark SQL)
>>> >
>>> > Please join me in welcoming them!
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
> --
> Iacovos Kolokasis
> Email: koloka...@ics.forth.gr
> Postgraduate Student CSD, University of Crete
> Researcher in CARV Lab ICS FORTH
>
> --
*Dongjin Lee*

*A hitchhiker in the mathematical world.*

*github:  <http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>slideshare:
www.slideshare.net/dongjinleekr
<http://www.slideshare.net/dongjinleekr>*


Re: from_csv

2018-09-19 Thread Dongjin Lee
Another +1.

I already experienced this case several times.

On Mon, Sep 17, 2018 at 11:03 AM Hyukjin Kwon  wrote:

> +1 for this idea since text parsing in CSV/JSON is quite common.
>
> One thing is about schema inference likewise with JSON functionality. In
> case of JSON, we added schema_of_json for it and same thing should be able
> to apply to CSV too.
> If we see some more needs for it, we can consider a function like
> schema_of_csv as well.
>
>
> 2018년 9월 16일 (일) 오후 4:41, Maxim Gekk 님이 작성:
>
>> Hi Reynold,
>>
>> > i'd make this as consistent as to_json / from_json as possible
>>
>> Sure, new function from_csv() has the same signature as from_json().
>>
>> > how would this work in sql? i.e. how would passing options in work?
>>
>> The options are passed to the function via map, for example:
>> select from_csv('26/08/2015', 'time Timestamp', map('timestampFormat',
>> 'dd/MM/'))
>>
>> On Sun, Sep 16, 2018 at 7:01 AM Reynold Xin  wrote:
>>
>>> makes sense - i'd make this as consistent as to_json / from_json as
>>> possible.
>>>
>>> how would this work in sql? i.e. how would passing options in work?
>>>
>>> --
>>> excuse the brevity and lower case due to wrist injury
>>>
>>>
>>> On Sat, Sep 15, 2018 at 2:58 AM Maxim Gekk 
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I would like to propose new function from_csv() for parsing columns
>>>> containing strings in CSV format. Here is my PR:
>>>> https://github.com/apache/spark/pull/22379
>>>>
>>>> An use case is loading a dataset from an external storage, dbms or
>>>> systems like Kafka to where CSV content was dumped as one of
>>>> columns/fields. Other columns could contain related information like
>>>> timestamps, ids, sources of data and etc. The column with CSV strings can
>>>> be parsed by existing method csv() of DataFrameReader but in that case
>>>> we have to "clean up" dataset and remove other columns since the csv()
>>>> method requires Dataset[String]. Joining back result of parsing and
>>>> original dataset by positions is expensive and not convenient. Instead
>>>> users parse CSV columns by string functions. The approach is usually error
>>>> prone especially for quoted values and other special cases.
>>>>
>>>> The proposed in the PR methods should make a better user experience in
>>>> parsing CSV-like columns. Please, share your thoughts.
>>>>
>>>> --
>>>>
>>>> Maxim Gekk
>>>>
>>>> Technical Solutions Lead
>>>>
>>>> Databricks Inc.
>>>>
>>>> maxim.g...@databricks.com
>>>>
>>>> databricks.com
>>>>
>>>>   <http://databricks.com/>
>>>>
>>>
>>

-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*

*github:  <http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>slideshare:
www.slideshare.net/dongjinleekr
<http://www.slideshare.net/dongjinleekr>*


Re: MatrixUDT and VectorUDT in Spark ML

2018-05-30 Thread Dongjin Lee
How is this issue going? Is there any Jira ticket about this?

Thanks,
Dongjin

On Sat, Mar 24, 2018 at 1:39 PM, Himanshu Mohan <
himanshu.mo...@aexp.com.invalid> wrote:

> I agree
>
>
>
>
>
>
>
> Thanks
>
> Himanshu
>
>
>
> *From:* Li Jin [mailto:ice.xell...@gmail.com]
> *Sent:* Friday, March 23, 2018 8:24 PM
> *To:* dev 
> *Subject:* MatrixUDT and VectorUDT in Spark ML
>
>
>
> Hi All,
>
>
>
> I came across these two types MatrixUDT and VectorUDF in Spark ML when
> doing feature extraction and preprocessing with PySpark. However, when
> trying to do some basic operations, such as vector multiplication and
> matrix multiplication, I had to go down to Python UDF.
>
>
>
> It seems to be it would be very useful to have built-in operators on these
> types just like first class Spark SQL types, e.g.,
>
>
>
> df.withColumn('v', df.matrix_column * df.vector_column)
>
>
>
> I wonder what are other people's thoughts on this?
>
>
>
> Li
>
> --
> American Express made the following annotations
> --
>
> "This message and any attachments are solely for the intended recipient
> and may contain confidential or privileged information. If you are not the
> intended recipient, any disclosure, copying, use, or distribution of the
> information included in this message and any attachments is prohibited. If
> you have received this communication in error, please notify us by reply
> e-mail and immediately and permanently delete this message and any
> attachments. Thank you."
>
> American Express a ajouté le commentaire suivant le
> Ce courrier et toute pièce jointe qu'il contient sont réservés au seul
> destinataire indiqué et peuvent renfermer des renseignements confidentiels
> et privilégiés. Si vous n'êtes pas le destinataire prévu, toute
> divulgation, duplication, utilisation ou distribution du courrier ou de
> toute pièce jointe est interdite. Si vous avez reçu cette communication par
> erreur, veuillez nous en aviser par courrier et détruire immédiatement le
> courrier et les pièces jointes. Merci.
> --
>
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*

*github:  <http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>slideshare:
www.slideshare.net/dongjinleekr
<http://www.slideshare.net/dongjinleekr>*


Missing config property in documentation

2017-05-03 Thread Dongjin Lee
Hello. I found that the property 'spark.resultGetter.threads'[^1] is not
listed in the official documentation. I wonder whether it is intended or
just a mistake.

If it is not intended, it would be better to update the documentation. Do
you have any opinion?

Thanks,
Dongjin

[^1]:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala

-- 
*Dongjin Lee*



*A hitchhiker in the mathematical
world.facebook: www.facebook.com/dongjin.lee.kr
<http://www.facebook.com/dongjin.lee.kr>linkedin:
kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>github:
<http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
<http://www.twitter.com/dongjinleekr>*


Re: Issues: Generate JSON with null values in Spark 2.0.x

2017-03-21 Thread Dongjin Lee
Hi Chetan,

Sadly, you can not; Spark is configured to ignore the null values when
writing JSON. (check JacksonMessageWriter and find
JsonInclude.Include.NON_NULL from the code.) If you want that
functionality, it would be much better to file the problem to JIRA.

Best,
Dongjin

On Mon, Mar 20, 2017 at 4:44 PM, Chetan Khatri <chetan.opensou...@gmail.com>
wrote:

> Exactly.
>
> On Sat, Mar 11, 2017 at 1:35 PM, Dongjin Lee <dong...@apache.org> wrote:
>
>> Hello Chetan,
>>
>> Could you post some code? If I understood correctly, you are trying to
>> save JSON like:
>>
>> {
>>   "first_name": "Dongjin",
>>   "last_name: null
>> }
>>
>> not in omitted form, like:
>>
>> {
>>   "first_name": "Dongjin"
>> }
>>
>> right?
>>
>> - Dongjin
>>
>> On Wed, Mar 8, 2017 at 5:58 AM, Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Hello Dev / Users,
>>>
>>> I am working with PySpark Code migration to scala, with Python -
>>> Iterating Spark with dictionary and generating JSON with null is possible
>>> with json.dumps() which will be converted to SparkSQL[Row] but in scala how
>>> can we generate json will null values as a Dataframe ?
>>>
>>> Thanks.
>>>
>>
>>
>>
>> --
>> *Dongjin Lee*
>>
>>
>> *Software developer in Line+.So interested in massive-scale machine
>> learning.facebook: www.facebook.com/dongjin.lee.kr
>> <http://www.facebook.com/dongjin.lee.kr>linkedin: 
>> kr.linkedin.com/in/dongjinleekr
>> <http://kr.linkedin.com/in/dongjinleekr>github:
>> <http://goog_969573159/>github.com/dongjinleekr
>> <http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
>> <http://www.twitter.com/dongjinleekr>*
>>
>
>


-- 
*Dongjin Lee*


*Software developer in Line+.So interested in massive-scale machine
learning.facebook: www.facebook.com/dongjin.lee.kr
<http://www.facebook.com/dongjin.lee.kr>linkedin:
kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>github:
<http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
<http://www.twitter.com/dongjinleekr>*


Re: Spark Local Pipelines

2017-03-13 Thread Dongjin Lee
Although I love the cool idea of Asher, I'd rather +1 for Sean's view; I
think it would be much better to live outside of the project.

Best,
Dongjin

On Mon, Mar 13, 2017 at 5:39 PM, Sean Owen <so...@cloudera.com> wrote:

> I'm skeptical.  Serving synchronous queries from a model at scale is a
> fundamentally different activity. As you note, it doesn't logically involve
> Spark. If it has to happen in milliseconds it's going to be in-core.
> Scoring even 10qps with a Spark job per request is probably a non-starter;
> think of the thousands of tasks per second and the overhead of just
> tracking them.
>
> When you say the RDDs support point prediction, I think you mean that
> those older models expose a method to score a Vector. They are not somehow
> exposing distributed point prediction. You could add this to the newer
> models, but it raises the question of how to make the Row to feed it? the
> .mllib punts on this and assumes you can construct the Vector.
>
> I think this sweeps a lot under the rug in assuming that there can just be
> a "local" version of every Transformer -- but, even if there could be,
> consider how much extra implementation that is. Lots of them probably could
> be but I'm not sure that all can.
>
> The bigger problem in my experience is the Pipelines don't generally
> encapsulate the entire pipeline from source data to score. They encapsulate
> the part after computing underlying features. That is, if one of your
> features is "total clicks from this user", that's the product of a
> DataFrame operation that precedes a Pipeline. This can't be turned into a
> non-distributed, non-Spark local version.
>
> Solving subsets of this problem could still be useful, and you've
> highlighted some external projects that try. I'd also highlight PMML as an
> established interchange format for just the model part, and for cases that
> don't involve much or any pipeline, it's a better fit paired with a library
> that can score from PMML.
>
> I think this is one of those things that could live outside the project,
> because it's more not-Spark than Spark. Remember too that building a
> solution into the project blesses one at the expense of others.
>
>
> On Sun, Mar 12, 2017 at 10:15 PM Asher Krim <ak...@hubspot.com> wrote:
>
>> Hi All,
>>
>> I spent a lot of time at Spark Summit East this year talking with Spark
>> developers and committers about challenges with productizing Spark. One of
>> the biggest shortcomings I've encountered in Spark ML pipelines is the lack
>> of a way to serve single requests with any reasonable performance.
>> SPARK-10413 explores adding methods for single item prediction, but I'd
>> like to explore a more holistic approach - a separate local api, with
>> models that support transformations without depending on Spark at all.
>>
>> I've written up a doc
>> <https://docs.google.com/document/d/1Ha4DRMio5A7LjPqiHUnwVzbaxbev6ys04myyz6nDgI4/edit?usp=sharing>
>> detailing the approach, and I'm happy to discuss alternatives. If this
>> gains traction, I can create a branch with a minimal example on a simple
>> transformer (probably something like CountVectorizerModel) so we have
>> something concrete to continue the discussion on.
>>
>> Thanks,
>> Asher Krim
>> Senior Software Engineer
>>
>


-- 
*Dongjin Lee*


*Software developer in Line+.So interested in massive-scale machine
learning.facebook: www.facebook.com/dongjin.lee.kr
<http://www.facebook.com/dongjin.lee.kr>linkedin:
kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>github:
<http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
<http://www.twitter.com/dongjinleekr>*


Re: Issues: Generate JSON with null values in Spark 2.0.x

2017-03-11 Thread Dongjin Lee
Hello Chetan,

Could you post some code? If I understood correctly, you are trying to save
JSON like:

{
  "first_name": "Dongjin",
  "last_name: null
}

not in omitted form, like:

{
  "first_name": "Dongjin"
}

right?

- Dongjin

On Wed, Mar 8, 2017 at 5:58 AM, Chetan Khatri <chetan.opensou...@gmail.com>
wrote:

> Hello Dev / Users,
>
> I am working with PySpark Code migration to scala, with Python - Iterating
> Spark with dictionary and generating JSON with null is possible with
> json.dumps() which will be converted to SparkSQL[Row] but in scala how can
> we generate json will null values as a Dataframe ?
>
> Thanks.
>



-- 
*Dongjin Lee*


*Software developer in Line+.So interested in massive-scale machine
learning.facebook: www.facebook.com/dongjin.lee.kr
<http://www.facebook.com/dongjin.lee.kr>linkedin:
kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>github:
<http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
<http://www.twitter.com/dongjinleekr>*


Re: GraphX-related "open" issues

2017-01-19 Thread Dongjin Lee
Thanks for your comments. Then, How about change following issues (see
below) into 'won't fix'? After Implementing & uploading them as Spark
Packages, commenting on those issues would be a reasonable solution. It
would also be better for the potential users of those graph algorithms.

- SPARK-15880: PREGEL Based Semi-Clustering Algorithm Implementation using
Spark GraphX API <https://issues.apache.org/jira/browse/SPARK-15880>
- SPARK-7244: Find vertex sequences satisfying predicates
<https://issues.apache.org/jira/browse/SPARK-7244>
- SPARK-7257: Find nearest neighbor satisfying predicate
<https://issues.apache.org/jira/browse/SPARK-7257>
- SPARK-8497: Graph Clique(Complete Connected Sub-graph) Discovery Algorithm
<https://issues.apache.org/jira/browse/SPARK-8497>

Best,
Dongjin

On Fri, Jan 20, 2017 at 2:48 AM, Michael Allman <mich...@videoamp.com>
wrote:

> Regarding new GraphX algorithms, I am in agreement with the idea of
> publishing algorithms which are implemented using the existing API as
> outside packages.
>
> Regarding SPARK-10335, we have a PR for SPARK-5484 which should address
> the problem described in that ticket. I've reviewed that PR, but because it
> touches the ML codebase I'd like to get an ML committer to review that PR.
> It's a relatively simple change and fixes an significant barrier to scaling
> in GraphX.
>
> https://github.com/apache/spark/pull/15125
>
> Cheers,
>
> Michael
>
>
> On Jan 19, 2017, at 8:09 AM, Takeshi Yamamuro <linguin@gmail.com>
> wrote:
>
> Thanks for your comment, Dongjin!
> I have a pretty basic and also important question; why do you implement
> these features as  a third-party library (and then upload them to the spark
> packages https://spark-packages.org/)? ISTM graphx has already necessary
> and sufficient APIs for these third-party ones.
>
> On Thu, Jan 19, 2017 at 12:21 PM, Dongjin Lee <dong...@apache.org> wrote:
>
>> Hi all,
>>
>> I am currently working on SPARK-15880[^1] and also have some interest
>> on SPARK-7244[^2] and SPARK-7257[^3]. In fact, SPARK-7244 and SPARK-7257
>> have some importance on graph analysis field.
>> Could you make them an exception? Since I am working on graph analysis, I
>> hope to take them.
>>
>> If needed, I can take SPARK-10335 and SPARK-8497 after them.
>>
>> Thanks,
>> Dongjin
>>
>> On Wed, Jan 18, 2017 at 2:40 AM, Sean Owen <so...@cloudera.com> wrote:
>>
>>> WontFix or Later is fine. There's not really any practical distinction.
>>> I figure that if something times out and is closed, it's very unlikely to
>>> be looked at again. Therefore marking it as something to do 'later' seemed
>>> less accurate.
>>>
>>> On Tue, Jan 17, 2017 at 5:30 PM Takeshi Yamamuro <linguin@gmail.com>
>>> wrote:
>>>
>>>> Thank for your comment!
>>>> I'm just thinking I'll set "Won't Fix" though, "Later" is also okay.
>>>> But, I re-checked "Contributing to JIRA Maintenance" in the
>>>> contribution guide (http://spark.apache.org/contributing.html) and
>>>> I couldn't find any setting policy about "Later".
>>>> So, IMO it's okay to set "Won't Fix" for now and those who'd like to
>>>> make prs feel free to (re?-)open tickets.
>>>>
>>>>
>>>> On Wed, Jan 18, 2017 at 1:48 AM, Dongjoon Hyun <dongj...@apache.org>
>>>> wrote:
>>>>
>>>> Hi, Takeshi.
>>>>
>>>> > So, IMO it seems okay to close tickets about "Improvement" and "New
>>>> Feature" for now.
>>>>
>>>> I'm just wondering about what kind of field value you want to fill in
>>>> the `Resolution` field for those issues.
>>>>
>>>> Maybe, 'Later'? Or, 'Won't Fix'?
>>>>
>>>> Bests,
>>>> Dongjoon.
>>>>
>>>> ---------
>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ---
>>>> Takeshi Yamamuro
>>>>
>>>
>>
>>
>> --
>> *Dongjin Lee*
>>
>>
>> *Software developer in Line+.So interested in massive-scale machine
>> learning.facebook: www.facebook.com/dongjin.lee.kr
>> <http://www.facebook.com/dongjin.lee.kr>linkedin: 
>> kr.linkedin.com/in/dongjinleekr
>> <http://kr.linkedin.com/in/dongjinleekr>github:
>> <http://goog_969573159/>github.com/dongjinleekr
>> <http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
>> <http://www.twitter.com/dongjinleekr>*
>>
>
>
>
> --
> ---
> Takeshi Yamamuro
>
>
>


-- 
*Dongjin Lee*


*Software developer in Line+.So interested in massive-scale machine
learning.facebook: www.facebook.com/dongjin.lee.kr
<http://www.facebook.com/dongjin.lee.kr>linkedin:
kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>github:
<http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
<http://www.twitter.com/dongjinleekr>*


Re: GraphX-related "open" issues

2017-01-18 Thread Dongjin Lee
Hi all,

I am currently working on SPARK-15880[^1] and also have some interest
on SPARK-7244[^2] and SPARK-7257[^3]. In fact, SPARK-7244 and SPARK-7257
have some importance on graph analysis field.
Could you make them an exception? Since I am working on graph analysis, I
hope to take them.

If needed, I can take SPARK-10335 and SPARK-8497 after them.

Thanks,
Dongjin

On Wed, Jan 18, 2017 at 2:40 AM, Sean Owen <so...@cloudera.com> wrote:

> WontFix or Later is fine. There's not really any practical distinction. I
> figure that if something times out and is closed, it's very unlikely to be
> looked at again. Therefore marking it as something to do 'later' seemed
> less accurate.
>
> On Tue, Jan 17, 2017 at 5:30 PM Takeshi Yamamuro <linguin@gmail.com>
> wrote:
>
>> Thank for your comment!
>> I'm just thinking I'll set "Won't Fix" though, "Later" is also okay.
>> But, I re-checked "Contributing to JIRA Maintenance" in the contribution
>> guide (http://spark.apache.org/contributing.html) and
>> I couldn't find any setting policy about "Later".
>> So, IMO it's okay to set "Won't Fix" for now and those who'd like to make
>> prs feel free to (re?-)open tickets.
>>
>>
>> On Wed, Jan 18, 2017 at 1:48 AM, Dongjoon Hyun <dongj...@apache.org>
>> wrote:
>>
>> Hi, Takeshi.
>>
>> > So, IMO it seems okay to close tickets about "Improvement" and "New
>> Feature" for now.
>>
>> I'm just wondering about what kind of field value you want to fill in the
>> `Resolution` field for those issues.
>>
>> Maybe, 'Later'? Or, 'Won't Fix'?
>>
>> Bests,
>> Dongjoon.
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>


-- 
*Dongjin Lee*


*Software developer in Line+.So interested in massive-scale machine
learning.facebook: www.facebook.com/dongjin.lee.kr
<http://www.facebook.com/dongjin.lee.kr>linkedin:
kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>github:
<http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr
<http://www.twitter.com/dongjinleekr>*