Re: UTF-8 Support for TextParser
Hi Marco, I understand that there needs to be a different discussion on strong dependency of mxnet and dmlc-core and how to fix it. Having said that, I think the goals of dmlc-core and mxnet are somewhat aligned. Posting in the MXNet dev list for this case is a good way to gather feedback from both the communities since I consider the MXNet community to be mostly a superset of the dmlc-core community. Anirudh On Mon, Feb 26, 2018 at 5:00 PM, Subramanian, Anirudhwrote: > Hi Tianqi, > > The UTF-8 support would enable other formats like CSV more usable. > Otherwise, they have to handle normalizing their data in some way before > using mxnet. > I understand that there is a tradeoff here because of the efficiency gains > from the parser but the expectation of having to normalize their UTF-8 > files may turn users away. > > Anirudh > > On 2/26/18, 3:54 PM, "workc...@gmail.com on behalf of Tianqi Chen" < > workc...@gmail.com on behalf of tqc...@cs.washington.edu> wrote: > > Since LibSVM format is only going to involve numbers and possibly ascii > characters, is there any reason adding UTF-8 support? Note that > generalization always comes with cost of efficiency and there is some > effort spent on making parser fast > > Tianqi > > On Mon, Feb 26, 2018 at 3:38 PM, Anirudh > wrote: > > > Hi all, > > > > Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text > parsers. > > I am currently working on adding UTF-8 support for Text parsers. > Since C++ > > doesn't have a great built-in support for UTF-8, I am looking at > > third-party libraries which provide Unicode support. I am > considering ICU > > currently. Any comments, suggestions, past experience, gotchas about > > unicode third party libraries or adding unicode support in general is > > highly appreciated. > > > > I have created an issue about the same: > > https://github.com/dmlc/dmlc-core/issues/372 > > Please feel free to reply to this email or comment on the github > issue if > > you have any inputs. > > > > Anirudh > > > > >
Re: UTF-8 Support for TextParser
The problem is that the DMLC organization and dmlc-core is not part of the Apache software foundation. If that change is specifically for dmlc-core, it has to be discussed in that community. This email list is for MXNet under the Apache incubator. Apparently, there's a very strong dependency of MXNet on the dmlc-core package which is not managed by this community. Risks like code over there not being properly validated by our CI (there has been a thread created by Chris just recently) aside - this is not the way an Apache project should work. MXNet is currently under the Apache software foundation while the actual core is managed by the DMLC organization, leaving the mxnet community without any say in decisions happening over there. We as a community should discuss whether we want to keep this strong dependency up. Anirudhschrieb am Di., 27. Feb. 2018, 00:51: > The code is going to go in the dmlc repository. What is wrong with > referencing the dmlc repository issue ? > > On Mon, Feb 26, 2018 at 3:48 PM, Marco de Abreu < > marco.g.ab...@googlemail.com> wrote: > > > That's not what I mean. Please create a proper issue and don't just > > reference the DMLC repository. > > > > Anirudh schrieb am Di., 27. Feb. 2018, 00:46: > > > > > Sure! Here is the link to the issue in MXNet repo: > > > https://github.com/apache/incubator-mxnet/issues/9891 > > > > > > On Mon, Feb 26, 2018 at 3:41 PM, Marco de Abreu < > > > marco.g.ab...@googlemail.com> wrote: > > > > > > > Hello, > > > > > > > > since DMLC is not affiliated with Apache, please create a GitHub > issue > > on > > > > our repository and link the issue here in order to provide a base for > > > > discussions. > > > > > > > > -Marco > > > > > > > > Anirudh schrieb am Di., 27. Feb. 2018, > 00:38: > > > > > > > > > Hi all, > > > > > > > > > > Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text > > > > parsers. > > > > > I am currently working on adding UTF-8 support for Text parsers. > > Since > > > > C++ > > > > > doesn't have a great built-in support for UTF-8, I am looking at > > > > > third-party libraries which provide Unicode support. I am > considering > > > ICU > > > > > currently. Any comments, suggestions, past experience, gotchas > about > > > > > unicode third party libraries or adding unicode support in general > is > > > > > highly appreciated. > > > > > > > > > > I have created an issue about the same: > > > > > https://github.com/dmlc/dmlc-core/issues/372 > > > > > Please feel free to reply to this email or comment on the github > > issue > > > if > > > > > you have any inputs. > > > > > > > > > > Anirudh > > > > > > > > > > > > > > >
Re: UTF-8 Support for TextParser
Since LibSVM format is only going to involve numbers and possibly ascii characters, is there any reason adding UTF-8 support? Note that generalization always comes with cost of efficiency and there is some effort spent on making parser fast Tianqi On Mon, Feb 26, 2018 at 3:38 PM, Anirudhwrote: > Hi all, > > Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text parsers. > I am currently working on adding UTF-8 support for Text parsers. Since C++ > doesn't have a great built-in support for UTF-8, I am looking at > third-party libraries which provide Unicode support. I am considering ICU > currently. Any comments, suggestions, past experience, gotchas about > unicode third party libraries or adding unicode support in general is > highly appreciated. > > I have created an issue about the same: > https://github.com/dmlc/dmlc-core/issues/372 > Please feel free to reply to this email or comment on the github issue if > you have any inputs. > > Anirudh >
Re: UTF-8 Support for TextParser
The code is going to go in the dmlc repository. What is wrong with referencing the dmlc repository issue ? On Mon, Feb 26, 2018 at 3:48 PM, Marco de Abreu < marco.g.ab...@googlemail.com> wrote: > That's not what I mean. Please create a proper issue and don't just > reference the DMLC repository. > > Anirudhschrieb am Di., 27. Feb. 2018, 00:46: > > > Sure! Here is the link to the issue in MXNet repo: > > https://github.com/apache/incubator-mxnet/issues/9891 > > > > On Mon, Feb 26, 2018 at 3:41 PM, Marco de Abreu < > > marco.g.ab...@googlemail.com> wrote: > > > > > Hello, > > > > > > since DMLC is not affiliated with Apache, please create a GitHub issue > on > > > our repository and link the issue here in order to provide a base for > > > discussions. > > > > > > -Marco > > > > > > Anirudh schrieb am Di., 27. Feb. 2018, 00:38: > > > > > > > Hi all, > > > > > > > > Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text > > > parsers. > > > > I am currently working on adding UTF-8 support for Text parsers. > Since > > > C++ > > > > doesn't have a great built-in support for UTF-8, I am looking at > > > > third-party libraries which provide Unicode support. I am considering > > ICU > > > > currently. Any comments, suggestions, past experience, gotchas about > > > > unicode third party libraries or adding unicode support in general is > > > > highly appreciated. > > > > > > > > I have created an issue about the same: > > > > https://github.com/dmlc/dmlc-core/issues/372 > > > > Please feel free to reply to this email or comment on the github > issue > > if > > > > you have any inputs. > > > > > > > > Anirudh > > > > > > > > > >
Re: UTF-8 Support for TextParser
That's not what I mean. Please create a proper issue and don't just reference the DMLC repository. Anirudhschrieb am Di., 27. Feb. 2018, 00:46: > Sure! Here is the link to the issue in MXNet repo: > https://github.com/apache/incubator-mxnet/issues/9891 > > On Mon, Feb 26, 2018 at 3:41 PM, Marco de Abreu < > marco.g.ab...@googlemail.com> wrote: > > > Hello, > > > > since DMLC is not affiliated with Apache, please create a GitHub issue on > > our repository and link the issue here in order to provide a base for > > discussions. > > > > -Marco > > > > Anirudh schrieb am Di., 27. Feb. 2018, 00:38: > > > > > Hi all, > > > > > > Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text > > parsers. > > > I am currently working on adding UTF-8 support for Text parsers. Since > > C++ > > > doesn't have a great built-in support for UTF-8, I am looking at > > > third-party libraries which provide Unicode support. I am considering > ICU > > > currently. Any comments, suggestions, past experience, gotchas about > > > unicode third party libraries or adding unicode support in general is > > > highly appreciated. > > > > > > I have created an issue about the same: > > > https://github.com/dmlc/dmlc-core/issues/372 > > > Please feel free to reply to this email or comment on the github issue > if > > > you have any inputs. > > > > > > Anirudh > > > > > >
Re: UTF-8 Support for TextParser
Sure! Here is the link to the issue in MXNet repo: https://github.com/apache/incubator-mxnet/issues/9891 On Mon, Feb 26, 2018 at 3:41 PM, Marco de Abreu < marco.g.ab...@googlemail.com> wrote: > Hello, > > since DMLC is not affiliated with Apache, please create a GitHub issue on > our repository and link the issue here in order to provide a base for > discussions. > > -Marco > > Anirudhschrieb am Di., 27. Feb. 2018, 00:38: > > > Hi all, > > > > Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text > parsers. > > I am currently working on adding UTF-8 support for Text parsers. Since > C++ > > doesn't have a great built-in support for UTF-8, I am looking at > > third-party libraries which provide Unicode support. I am considering ICU > > currently. Any comments, suggestions, past experience, gotchas about > > unicode third party libraries or adding unicode support in general is > > highly appreciated. > > > > I have created an issue about the same: > > https://github.com/dmlc/dmlc-core/issues/372 > > Please feel free to reply to this email or comment on the github issue if > > you have any inputs. > > > > Anirudh > > >
Re: UTF-8 Support for TextParser
Hello, since DMLC is not affiliated with Apache, please create a GitHub issue on our repository and link the issue here in order to provide a base for discussions. -Marco Anirudhschrieb am Di., 27. Feb. 2018, 00:38: > Hi all, > > Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text parsers. > I am currently working on adding UTF-8 support for Text parsers. Since C++ > doesn't have a great built-in support for UTF-8, I am looking at > third-party libraries which provide Unicode support. I am considering ICU > currently. Any comments, suggestions, past experience, gotchas about > unicode third party libraries or adding unicode support in general is > highly appreciated. > > I have created an issue about the same: > https://github.com/dmlc/dmlc-core/issues/372 > Please feel free to reply to this email or comment on the github issue if > you have any inputs. > > Anirudh >
UTF-8 Support for TextParser
Hi all, Currently there is no UTF-8 Support for LibSVM, LibFM or CSV Text parsers. I am currently working on adding UTF-8 support for Text parsers. Since C++ doesn't have a great built-in support for UTF-8, I am looking at third-party libraries which provide Unicode support. I am considering ICU currently. Any comments, suggestions, past experience, gotchas about unicode third party libraries or adding unicode support in general is highly appreciated. I have created an issue about the same: https://github.com/dmlc/dmlc-core/issues/372 Please feel free to reply to this email or comment on the github issue if you have any inputs. Anirudh
Re: Refresh issues in master version - Need help from Apache Infra team
Hello, I verified this issue and I don't understand why some pages are not getting refreshed. Maybe there's a problem on Apache Infras side. @Mentors can we create a ticket to request assistance? Best regards, Marco santhosh karuhattyschrieb am Mo., 26. Feb. 2018, 21:02: > Hello All, > > Recently, we did a build for mxnet website and I am seeing an issue of > pages not serving the latest content. > > *Issue*: When switched to master version, we dont see the 1.1.0 version > listed in the Versions dropdown. > > *Repro details:* > >- Visit - http://mxnet.incubator.apache.org/ you will land into v1.1.0 >in the Versions drop down menu bar. >- Switch to master from the drop down and then check the list in the >Versions. Notice we do not see 1.1.0. >- I manually verified the code is in place for listing 1.1.0 in the >versions/master branch, and we do have it indeed. Refer to - > > https://github.com/apache/incubator-mxnet-site/blob/asf-site/versions/master/index.html#L99 >and you can see 1.1.0 listed in index.html >- I also locally hosted asf-site branch locally and can see 1.1.0 listed >when switched master locally ( > http://localhost/versions/master/index.html >) > > > Can some one from infra team verify why there is a discrepancy here ? > Thanks > > -Santhosh >
Refresh issues in master version - Need help from Apache Infra team
Hello All, Recently, we did a build for mxnet website and I am seeing an issue of pages not serving the latest content. *Issue*: When switched to master version, we dont see the 1.1.0 version listed in the Versions dropdown. *Repro details:* - Visit - http://mxnet.incubator.apache.org/ you will land into v1.1.0 in the Versions drop down menu bar. - Switch to master from the drop down and then check the list in the Versions. Notice we do not see 1.1.0. - I manually verified the code is in place for listing 1.1.0 in the versions/master branch, and we do have it indeed. Refer to - https://github.com/apache/incubator-mxnet-site/blob/asf-site/versions/master/index.html#L99 and you can see 1.1.0 listed in index.html - I also locally hosted asf-site branch locally and can see 1.1.0 listed when switched master locally (http://localhost/versions/master/index.html ) Can some one from infra team verify why there is a discrepancy here ? Thanks -Santhosh