Re: UTF-8 Support for TextParser

2018-02-26 Thread Anirudh
Hi Marco,

I understand that there needs to be a different discussion on strong
dependency of mxnet and dmlc-core and how to fix it.

Having said that, I think the goals of dmlc-core and mxnet are somewhat
aligned. Posting in the MXNet dev list for this case
is a good way to gather feedback from both the communities since I consider
the MXNet community to be mostly a superset of the dmlc-core community.

Anirudh

On Mon, Feb 26, 2018 at 5:00 PM, Subramanian, Anirudh 
wrote:

> Hi Tianqi,
>
> The UTF-8 support would enable other formats like CSV more usable.
> Otherwise, they have to handle normalizing their data in some way before
> using mxnet.
> I understand that there is a tradeoff here because of the efficiency gains
> from the parser but the expectation of having to normalize their UTF-8
> files may turn users away.
>
> Anirudh
>
> On 2/26/18, 3:54 PM, "workc...@gmail.com on behalf of Tianqi Chen" <
> workc...@gmail.com on behalf of tqc...@cs.washington.edu> wrote:
>
> Since LibSVM format is only going to involve numbers and possibly ascii
> characters, is there any reason adding UTF-8 support? Note that
> generalization always comes with cost of efficiency and there is some
> effort spent on making parser fast
>
> Tianqi
>
> On Mon, Feb 26, 2018 at 3:38 PM, Anirudh 
> wrote:
>
> > Hi all,
> >
> > Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text
> parsers.
> > I am currently working on adding UTF-8 support for Text parsers.
> Since C++
> > doesn't have a great built-in support for UTF-8, I am looking at
> > third-party libraries which provide Unicode support. I am
> considering ICU
> > currently. Any comments, suggestions, past experience, gotchas about
> > unicode third party libraries or adding unicode support in general is
> > highly appreciated.
> >
> > I have created an issue about the same:
> > https://github.com/dmlc/dmlc-core/issues/372
> > Please feel free to reply to this email or comment on the github
> issue if
> > you have any inputs.
> >
> > Anirudh
> >
>
>
>


Re: UTF-8 Support for TextParser

2018-02-26 Thread Marco de Abreu
The problem is that the DMLC organization and dmlc-core is not part of the
Apache software foundation. If that change is specifically for dmlc-core,
it has to be discussed in that community. This email list is for MXNet
under the Apache incubator.

Apparently, there's a very strong dependency of MXNet on the dmlc-core
package which is not managed by this community. Risks like code over there
not being properly validated by our CI (there has been a thread created by
Chris just recently) aside - this is not the way an Apache project should
work. MXNet is currently under the Apache software foundation while the
actual core is managed by the DMLC organization, leaving the mxnet
community without any say in decisions happening over there.

We as a community should discuss whether we want to keep this strong
dependency up.

Anirudh  schrieb am Di., 27. Feb. 2018, 00:51:

> The code is going to go in the dmlc repository. What is wrong with
> referencing the dmlc repository issue ?
>
> On Mon, Feb 26, 2018 at 3:48 PM, Marco de Abreu <
> marco.g.ab...@googlemail.com> wrote:
>
> > That's not what I mean. Please create a proper issue and don't just
> > reference the DMLC repository.
> >
> > Anirudh  schrieb am Di., 27. Feb. 2018, 00:46:
> >
> > > Sure! Here is the link to the issue in MXNet repo:
> > > https://github.com/apache/incubator-mxnet/issues/9891
> > >
> > > On Mon, Feb 26, 2018 at 3:41 PM, Marco de Abreu <
> > > marco.g.ab...@googlemail.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > since DMLC is not affiliated with Apache, please create a GitHub
> issue
> > on
> > > > our repository and link the issue here in order to provide a base for
> > > > discussions.
> > > >
> > > > -Marco
> > > >
> > > > Anirudh  schrieb am Di., 27. Feb. 2018,
> 00:38:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text
> > > > parsers.
> > > > > I am currently working on adding UTF-8 support for Text parsers.
> > Since
> > > > C++
> > > > > doesn't have a great built-in support for UTF-8, I am looking at
> > > > > third-party libraries which provide Unicode support. I am
> considering
> > > ICU
> > > > > currently. Any comments, suggestions, past experience, gotchas
> about
> > > > > unicode third party libraries or adding unicode support in general
> is
> > > > > highly appreciated.
> > > > >
> > > > > I have created an issue about the same:
> > > > > https://github.com/dmlc/dmlc-core/issues/372
> > > > > Please feel free to reply to this email or comment on the github
> > issue
> > > if
> > > > > you have any inputs.
> > > > >
> > > > > Anirudh
> > > > >
> > > >
> > >
> >
>


Re: UTF-8 Support for TextParser

2018-02-26 Thread Tianqi Chen
Since LibSVM format is only going to involve numbers and possibly ascii
characters, is there any reason adding UTF-8 support? Note that
generalization always comes with cost of efficiency and there is some
effort spent on making parser fast

Tianqi

On Mon, Feb 26, 2018 at 3:38 PM, Anirudh  wrote:

> Hi all,
>
> Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text parsers.
> I am currently working on adding UTF-8 support for Text parsers. Since C++
> doesn't have a great built-in support for UTF-8, I am looking at
> third-party libraries which provide Unicode support. I am considering ICU
> currently. Any comments, suggestions, past experience, gotchas about
> unicode third party libraries or adding unicode support in general is
> highly appreciated.
>
> I have created an issue about the same:
> https://github.com/dmlc/dmlc-core/issues/372
> Please feel free to reply to this email or comment on the github issue if
> you have any inputs.
>
> Anirudh
>


Re: UTF-8 Support for TextParser

2018-02-26 Thread Anirudh
The code is going to go in the dmlc repository. What is wrong with
referencing the dmlc repository issue ?

On Mon, Feb 26, 2018 at 3:48 PM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> That's not what I mean. Please create a proper issue and don't just
> reference the DMLC repository.
>
> Anirudh  schrieb am Di., 27. Feb. 2018, 00:46:
>
> > Sure! Here is the link to the issue in MXNet repo:
> > https://github.com/apache/incubator-mxnet/issues/9891
> >
> > On Mon, Feb 26, 2018 at 3:41 PM, Marco de Abreu <
> > marco.g.ab...@googlemail.com> wrote:
> >
> > > Hello,
> > >
> > > since DMLC is not affiliated with Apache, please create a GitHub issue
> on
> > > our repository and link the issue here in order to provide a base for
> > > discussions.
> > >
> > > -Marco
> > >
> > > Anirudh  schrieb am Di., 27. Feb. 2018, 00:38:
> > >
> > > > Hi all,
> > > >
> > > > Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text
> > > parsers.
> > > > I am currently working on adding UTF-8 support for Text parsers.
> Since
> > > C++
> > > > doesn't have a great built-in support for UTF-8, I am looking at
> > > > third-party libraries which provide Unicode support. I am considering
> > ICU
> > > > currently. Any comments, suggestions, past experience, gotchas about
> > > > unicode third party libraries or adding unicode support in general is
> > > > highly appreciated.
> > > >
> > > > I have created an issue about the same:
> > > > https://github.com/dmlc/dmlc-core/issues/372
> > > > Please feel free to reply to this email or comment on the github
> issue
> > if
> > > > you have any inputs.
> > > >
> > > > Anirudh
> > > >
> > >
> >
>


Re: UTF-8 Support for TextParser

2018-02-26 Thread Marco de Abreu
That's not what I mean. Please create a proper issue and don't just
reference the DMLC repository.

Anirudh  schrieb am Di., 27. Feb. 2018, 00:46:

> Sure! Here is the link to the issue in MXNet repo:
> https://github.com/apache/incubator-mxnet/issues/9891
>
> On Mon, Feb 26, 2018 at 3:41 PM, Marco de Abreu <
> marco.g.ab...@googlemail.com> wrote:
>
> > Hello,
> >
> > since DMLC is not affiliated with Apache, please create a GitHub issue on
> > our repository and link the issue here in order to provide a base for
> > discussions.
> >
> > -Marco
> >
> > Anirudh  schrieb am Di., 27. Feb. 2018, 00:38:
> >
> > > Hi all,
> > >
> > > Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text
> > parsers.
> > > I am currently working on adding UTF-8 support for Text parsers. Since
> > C++
> > > doesn't have a great built-in support for UTF-8, I am looking at
> > > third-party libraries which provide Unicode support. I am considering
> ICU
> > > currently. Any comments, suggestions, past experience, gotchas about
> > > unicode third party libraries or adding unicode support in general is
> > > highly appreciated.
> > >
> > > I have created an issue about the same:
> > > https://github.com/dmlc/dmlc-core/issues/372
> > > Please feel free to reply to this email or comment on the github issue
> if
> > > you have any inputs.
> > >
> > > Anirudh
> > >
> >
>


Re: UTF-8 Support for TextParser

2018-02-26 Thread Anirudh
Sure! Here is the link to the issue in MXNet repo:
https://github.com/apache/incubator-mxnet/issues/9891

On Mon, Feb 26, 2018 at 3:41 PM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> Hello,
>
> since DMLC is not affiliated with Apache, please create a GitHub issue on
> our repository and link the issue here in order to provide a base for
> discussions.
>
> -Marco
>
> Anirudh  schrieb am Di., 27. Feb. 2018, 00:38:
>
> > Hi all,
> >
> > Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text
> parsers.
> > I am currently working on adding UTF-8 support for Text parsers. Since
> C++
> > doesn't have a great built-in support for UTF-8, I am looking at
> > third-party libraries which provide Unicode support. I am considering ICU
> > currently. Any comments, suggestions, past experience, gotchas about
> > unicode third party libraries or adding unicode support in general is
> > highly appreciated.
> >
> > I have created an issue about the same:
> > https://github.com/dmlc/dmlc-core/issues/372
> > Please feel free to reply to this email or comment on the github issue if
> > you have any inputs.
> >
> > Anirudh
> >
>


Re: UTF-8 Support for TextParser

2018-02-26 Thread Marco de Abreu
Hello,

since DMLC is not affiliated with Apache, please create a GitHub issue on
our repository and link the issue here in order to provide a base for
discussions.

-Marco

Anirudh  schrieb am Di., 27. Feb. 2018, 00:38:

> Hi all,
>
> Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text parsers.
> I am currently working on adding UTF-8 support for Text parsers. Since C++
> doesn't have a great built-in support for UTF-8, I am looking at
> third-party libraries which provide Unicode support. I am considering ICU
> currently. Any comments, suggestions, past experience, gotchas about
> unicode third party libraries or adding unicode support in general is
> highly appreciated.
>
> I have created an issue about the same:
> https://github.com/dmlc/dmlc-core/issues/372
> Please feel free to reply to this email or comment on the github issue if
> you have any inputs.
>
> Anirudh
>


UTF-8 Support for TextParser

2018-02-26 Thread Anirudh
Hi all,

Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text parsers.
I am currently working on adding UTF-8 support for Text parsers. Since C++
doesn't have a great built-in support for UTF-8, I am looking at
third-party libraries which provide Unicode support. I am considering ICU
currently. Any comments, suggestions, past experience, gotchas about
unicode third party libraries or adding unicode support in general is
highly appreciated.

I have created an issue about the same:
https://github.com/dmlc/dmlc-core/issues/372
Please feel free to reply to this email or comment on the github issue if
you have any inputs.

Anirudh


Re: Refresh issues in master version - Need help from Apache Infra team

2018-02-26 Thread Marco de Abreu
Hello,

I verified this issue and I don't understand why some pages are not getting
refreshed. Maybe there's a problem on Apache Infras side.

@Mentors can we create a ticket to request assistance?

Best regards,
Marco

santhosh karuhatty  schrieb am Mo., 26. Feb. 2018,
21:02:

> Hello All,
>
> Recently, we did a build for mxnet website and I am seeing an issue of
> pages not serving the latest content.
>
> *Issue*: When switched to master version, we dont see the 1.1.0 version
> listed in the Versions dropdown.
>
> *Repro details:*
>
>- Visit - http://mxnet.incubator.apache.org/ you will land into v1.1.0
>in the Versions drop down menu bar.
>- Switch to master from the drop down and then check the list in the
>Versions. Notice we do not see 1.1.0.
>- I manually verified the code is in place for listing 1.1.0 in the
>versions/master branch, and we do have it indeed. Refer to -
>
> https://github.com/apache/incubator-mxnet-site/blob/asf-site/versions/master/index.html#L99
>and you can see 1.1.0 listed in index.html
>- I also locally hosted asf-site branch locally and can see 1.1.0 listed
>when switched master locally (
> http://localhost/versions/master/index.html
>)
>
>
> Can some one from infra team verify why there is a discrepancy here ?
> Thanks
>
> -Santhosh
>


Refresh issues in master version - Need help from Apache Infra team

2018-02-26 Thread santhosh karuhatty
Hello All,

Recently, we did a build for mxnet website and I am seeing an issue of
pages not serving the latest content.

*Issue*: When switched to master version, we dont see the 1.1.0 version
listed in the Versions dropdown.

*Repro details:*

   - Visit - http://mxnet.incubator.apache.org/ you will land into v1.1.0
   in the Versions drop down menu bar.
   - Switch to master from the drop down and then check the list in the
   Versions. Notice we do not see 1.1.0.
   - I manually verified the code is in place for listing 1.1.0 in the
   versions/master branch, and we do have it indeed. Refer to -
   
https://github.com/apache/incubator-mxnet-site/blob/asf-site/versions/master/index.html#L99
   and you can see 1.1.0 listed in index.html
   - I also locally hosted asf-site branch locally and can see 1.1.0 listed
   when switched master locally (http://localhost/versions/master/index.html
   )


Can some one from infra team verify why there is a discrepancy here ? Thanks

-Santhosh