Bug#941569: RFS: sentencepiece/0.1.83+dfsg-1 [ITP] -- Unsupervised text tokenizer for Neural Network-based text generation
On Thu, 03 Oct 2019 13:16:53 +0900, Mo Zhou wrote: > Your copyright file is not complete > https://bitbucket.org/tsuchm/pkg-sentencepiece/src/master/debian/copyright Thank you for your pointing out. Fourtunately, the data directory is not essential. The software is forcused to build your own tokenizer model. > Besides, the packaging of tensorflow is stalled, as it's difficult > to tame the 4.5 million lines of code without a usable build system. > For a long time the users (including myself) have to (somewhat) > depend on third party ecosystems until the day Google started to > rethink about distribution integration (basically hopeless). I agree. It seems too complex and quite fast to develop. > Apart from the science team, you are welcome to join the deep learning > team as well: https://salsa.debian.org/deeplearning-team > (it's an informal team) Ok, I sent a request.
Bug#941569: RFS: sentencepiece/0.1.83+dfsg-1 [ITP] -- Unsupervised text tokenizer for Neural Network-based text generation
(re-sent due to incorrect CC address in last post) Hi NOKUBI, Thank you for working on this. Although it may sound boring or even frustrating, data used for training machine learning models, or pre-trained machine learning models should be carefully dealt with. Your copyright file is not complete https://bitbucket.org/tsuchm/pkg-sentencepiece/src/master/debian/copyright at least one file in data/ directory are not apache-2.0 licensed: https://github.com/google/sentencepiece/blob/master/data/botchan.txt#L1-L12 https://github.com/google/sentencepiece/blob/master/data/Scripts.txt#L1-L13 and I'm wondering whether the Japanese poetry book is free: (I don't speak Japanese but from the "Chinese characters" within the text I guess it's a poetry book) https://raw.githubusercontent.com/google/sentencepiece/master/data/wagahaiwa_nekodearu.txt as its publisher is 青空文庫. Please confirm the copyright information for this book and its DFSG compliance. When there are DFSG-incompatible stuff in a source package, a common practice in Debian is to strip those components from the original tarballs and prefix the version string with +dfsg. However, data-driven applications could become useless when the training data was removed... This is an awkward difficulty, or say conflict in practice between free software world and the academical machine learning (computational linguistics) community. Besides, the packaging of tensorflow is stalled, as it's difficult to tame the 4.5 million lines of code without a usable build system. For a long time the users (including myself) have to (somewhat) depend on third party ecosystems until the day Google started to rethink about distribution integration (basically hopeless). Apart from the science team, you are welcome to join the deep learning team as well: https://salsa.debian.org/deeplearning-team (it's an informal team) On 2019-10-03 02:37, NOKUBI Takatsugu wrote: > On Wed, 02 Oct 2019 14:52:23 +0900, > Kentaro Hayashi wrote: >> * Vcs : https://salsa.debian.org/debian/sentencepiece > > It contains tensorflow binding, so I think it will be good to belong > with Debian Science Team. > > I, hayashi-san, and tsuchiya-san sent requests to join the team. > tsuchiya-san also maintained it himself, so I'll merge them into > the salsa repository. > > https://bitbucket.org/tsuchm/pkg-sentencepiece/src/master/
Bug#941569: RFS: sentencepiece/0.1.83+dfsg-1 [ITP] -- Unsupervised text tokenizer for Neural Network-based text generation
Hi NOKUBI, Thank you for working on this. Although it may sound boring or even frustrating, data used for training machine learning models, or pre-trained machine learning models should be carefully dealt with. Your copyright file is not complete https://bitbucket.org/tsuchm/pkg-sentencepiece/src/master/debian/copyright at least one file in data/ directory are not apache-2.0 licensed: https://github.com/google/sentencepiece/blob/master/data/botchan.txt#L1-L12 https://github.com/google/sentencepiece/blob/master/data/Scripts.txt#L1-L13 and I'm wondering whether the Japanese poetry book is free: (I don't speak Japanese but from the "Chinese characters" within the text I guess it's a poetry book) https://raw.githubusercontent.com/google/sentencepiece/master/data/wagahaiwa_nekodearu.txt as its publisher is 青空文庫. Please confirm the copyright information for this book and its DFSG compliance. When there are DFSG-incompatible stuff in a source package, a common practice in Debian is to strip those components from the original tarballs and prefix the version string with +dfsg. However, data-driven applications could become useless when the training data was removed... This is an awkward difficulty, or say conflict in practice between free software world and the academical machine learning (computational linguistics) community. Besides, the packaging of tensorflow is stalled, as it's difficult to tame the 4.5 million lines of code without a usable build system. For a long time the users (including myself) have to (somewhat) depend on third party ecosystems until the day Google started to rethink about distribution integration (basically hopeless). Apart from the science team, you are welcome to join the deep learning team as well: https://salsa.debian.org/deeplearning-team (it's an informal team) On 2019-10-03 02:37, NOKUBI Takatsugu wrote: > On Wed, 02 Oct 2019 14:52:23 +0900, > Kentaro Hayashi wrote: >> * Vcs : https://salsa.debian.org/debian/sentencepiece > > It contains tensorflow binding, so I think it will be good to belong > with Debian Science Team. > > I, hayashi-san, and tsuchiya-san sent requests to join the team. > tsuchiya-san also maintained it himself, so I'll merge them into > the salsa repository. > > https://bitbucket.org/tsuchm/pkg-sentencepiece/src/master/
Bug#941569: RFS: sentencepiece/0.1.83+dfsg-1 [ITP] -- Unsupervised text tokenizer for Neural Network-based text generation
On Wed, 02 Oct 2019 14:52:23 +0900, Kentaro Hayashi wrote: > * Vcs : https://salsa.debian.org/debian/sentencepiece It contains tensorflow binding, so I think it will be good to belong with Debian Science Team. I, hayashi-san, and tsuchiya-san sent requests to join the team. tsuchiya-san also maintained it himself, so I'll merge them into the salsa repository. https://bitbucket.org/tsuchm/pkg-sentencepiece/src/master/
Bug#941569: RFS: sentencepiece/0.1.83+dfsg-1 [ITP] -- Unsupervised text tokenizer for Neural Network-based text generation
Hi, Thank you for reviewing. It is very helpful. On Wed, 2 Oct 2019 23:13:45 +0200 Adam Borowski wrote: > On Wed, Oct 02, 2019 at 02:52:23PM +0900, Kentaro Hayashi wrote: > > * Package name: sentencepiece > >Version : 0.1.83+dfsg-1 > >Upstream Author : Google Inc. > > * URL : https://github.com/google/sentencepiece snip > Hi! > The runtime library package (ie, libsentencepiece) should have soname > appended. This might be not a big change, but amending this later would > require a trip through NEW, thus let's get it right from the start. > > The watch file also should mangle the version, to include +dfsg. > > It is inappropriate to make lintian overrides for real bugs (like a lack of > manpage). You are not required to fix everything immediately -- heck, there > are many bugs that don't get fixed ever -- but that's not a valid use for > overrides. They're for false positives. > > But overall, the package seems almost ready. I could find a sponsor personally (@knok) and a collabolator(@tsuchm), so I'll fix above issues with them. Thanks!!! Regards,
Bug#941569: RFS: sentencepiece/0.1.83+dfsg-1 [ITP] -- Unsupervised text tokenizer for Neural Network-based text generation
On Wed, Oct 02, 2019 at 02:52:23PM +0900, Kentaro Hayashi wrote: > * Package name: sentencepiece >Version : 0.1.83+dfsg-1 >Upstream Author : Google Inc. > * URL : https://github.com/google/sentencepiece > It builds those binary packages: > > libsentencepiece-dev - Development package for libsentencepiece > libsentencepiece - Unsupervised text tokenizer for Neural Network-based > text generation > sentencepiece-tools - Utility package for SentencePiece Hi! The runtime library package (ie, libsentencepiece) should have soname appended. This might be not a big change, but amending this later would require a trip through NEW, thus let's get it right from the start. The watch file also should mangle the version, to include +dfsg. It is inappropriate to make lintian overrides for real bugs (like a lack of manpage). You are not required to fix everything immediately -- heck, there are many bugs that don't get fixed ever -- but that's not a valid use for overrides. They're for false positives. But overall, the package seems almost ready. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ A MAP07 (Dead Simple) raspberry tincture recipe: 0.5l 95% alcohol, ⣾⠁⢠⠒⠀⣿⡁ 1kg raspberries, 0.4kg sugar; put into a big jar for 1 month. ⢿⡄⠘⠷⠚⠋⠀ Filter out and throw away the fruits (can dump them into a cake, ⠈⠳⣄ etc), let the drink age at least 3-6 months.
Bug#941569: RFS: sentencepiece/0.1.83+dfsg-1 [ITP] -- Unsupervised text tokenizer for Neural Network-based text generation
Package: sponsorship-requests Severity: wishlist Dear mentors, I am looking for a sponsor for my package "sentencepiece" * Package name: sentencepiece Version : 0.1.83+dfsg-1 Upstream Author : Google Inc. * URL : https://github.com/google/sentencepiece * License : Apache-2.0 * Vcs : https://salsa.debian.org/debian/sentencepiece Section : libs It builds those binary packages: libsentencepiece-dev - Development package for libsentencepiece libsentencepiece - Unsupervised text tokenizer for Neural Network-based text generation sentencepiece-tools - Utility package for SentencePiece To access further information about this package, please visit the following URL: https://mentors.debian.net/package/sentencepiece Alternatively, one can download the package with dget using this command: dget -x https://mentors.debian.net/debian/pool/main/s/sentencepiece/sentencepiece_0.1.83+dfsg-1.dsc Changes since the last upload: * Initial release (Closes: #939860) Regards,