Bug#941569: RFS: sentencepiece/0.1.83+dfsg-1 [ITP] -- Unsupervised text tokenizer for Neural Network-based text generation

2019-10-03 Thread NOKUBI Takatsugu
On Thu, 03 Oct 2019 13:16:53 +0900,
Mo Zhou wrote:
> Your copyright file is not complete
> https://bitbucket.org/tsuchm/pkg-sentencepiece/src/master/debian/copyright

Thank you for your pointing out.

Fourtunately, the data directory is not essential. The software is
forcused to build your own tokenizer model.

> Besides, the packaging of tensorflow is stalled, as it's difficult
> to tame the 4.5 million lines of code without a usable build system.
> For a long time the users (including myself) have to (somewhat)
> depend on third party ecosystems until the day Google started to
> rethink about distribution integration (basically hopeless).

I agree. It seems too complex and quite fast to develop.

> Apart from the science team, you are welcome to join the deep learning
> team as well: https://salsa.debian.org/deeplearning-team
> (it's an informal team)

Ok, I sent a request.



Bug#941569: RFS: sentencepiece/0.1.83+dfsg-1 [ITP] -- Unsupervised text tokenizer for Neural Network-based text generation

2019-10-02 Thread Mo Zhou
(re-sent due to incorrect CC address in last post)
Hi NOKUBI,

Thank you for working on this.
Although it may sound boring or even frustrating, data used for training
machine learning models, or pre-trained machine learning models
should be carefully dealt with.

Your copyright file is not complete
https://bitbucket.org/tsuchm/pkg-sentencepiece/src/master/debian/copyright

at least one file in data/ directory are not apache-2.0 licensed:
https://github.com/google/sentencepiece/blob/master/data/botchan.txt#L1-L12
https://github.com/google/sentencepiece/blob/master/data/Scripts.txt#L1-L13

and I'm wondering whether the Japanese poetry book is free:
(I don't speak Japanese but from the "Chinese characters" within the
text
 I guess it's a poetry book)
https://raw.githubusercontent.com/google/sentencepiece/master/data/wagahaiwa_nekodearu.txt
as its publisher is 青空文庫. Please confirm the copyright information
for this book and its DFSG compliance.

When there are DFSG-incompatible stuff in a source package, a common
practice in Debian is to strip those components from the original
tarballs and prefix the version string with +dfsg. However, data-driven
applications could become useless when the training data was removed...
This is an awkward difficulty, or say conflict in practice between
free software world and the academical machine learning (computational
linguistics) community.

Besides, the packaging of tensorflow is stalled, as it's difficult
to tame the 4.5 million lines of code without a usable build system.
For a long time the users (including myself) have to (somewhat)
depend on third party ecosystems until the day Google started to
rethink about distribution integration (basically hopeless).

Apart from the science team, you are welcome to join the deep learning
team as well: https://salsa.debian.org/deeplearning-team
(it's an informal team)

On 2019-10-03 02:37, NOKUBI Takatsugu wrote:
> On Wed, 02 Oct 2019 14:52:23 +0900,
> Kentaro Hayashi wrote:
>>  * Vcs : https://salsa.debian.org/debian/sentencepiece
> 
> It contains tensorflow binding, so I think it will be good to belong
> with Debian Science Team.
> 
> I, hayashi-san, and tsuchiya-san sent requests to join the team.
> tsuchiya-san also maintained it himself, so I'll merge them into
> the salsa repository.
> 
> https://bitbucket.org/tsuchm/pkg-sentencepiece/src/master/



Bug#941569: RFS: sentencepiece/0.1.83+dfsg-1 [ITP] -- Unsupervised text tokenizer for Neural Network-based text generation

2019-10-02 Thread Mo Zhou
Hi NOKUBI,

Thank you for working on this.
Although it may sound boring or even frustrating, data used for training
machine learning models, or pre-trained machine learning models
should be carefully dealt with.

Your copyright file is not complete
https://bitbucket.org/tsuchm/pkg-sentencepiece/src/master/debian/copyright

at least one file in data/ directory are not apache-2.0 licensed:
https://github.com/google/sentencepiece/blob/master/data/botchan.txt#L1-L12
https://github.com/google/sentencepiece/blob/master/data/Scripts.txt#L1-L13

and I'm wondering whether the Japanese poetry book is free:
(I don't speak Japanese but from the "Chinese characters" within the
text
 I guess it's a poetry book)
https://raw.githubusercontent.com/google/sentencepiece/master/data/wagahaiwa_nekodearu.txt
as its publisher is 青空文庫. Please confirm the copyright information
for this book and its DFSG compliance.

When there are DFSG-incompatible stuff in a source package, a common
practice in Debian is to strip those components from the original
tarballs and prefix the version string with +dfsg. However, data-driven
applications could become useless when the training data was removed...
This is an awkward difficulty, or say conflict in practice between
free software world and the academical machine learning (computational
linguistics) community.

Besides, the packaging of tensorflow is stalled, as it's difficult
to tame the 4.5 million lines of code without a usable build system.
For a long time the users (including myself) have to (somewhat)
depend on third party ecosystems until the day Google started to
rethink about distribution integration (basically hopeless).

Apart from the science team, you are welcome to join the deep learning
team as well: https://salsa.debian.org/deeplearning-team
(it's an informal team)

On 2019-10-03 02:37, NOKUBI Takatsugu wrote:
> On Wed, 02 Oct 2019 14:52:23 +0900,
> Kentaro Hayashi wrote:
>>  * Vcs : https://salsa.debian.org/debian/sentencepiece
> 
> It contains tensorflow binding, so I think it will be good to belong
> with Debian Science Team.
> 
> I, hayashi-san, and tsuchiya-san sent requests to join the team.
> tsuchiya-san also maintained it himself, so I'll merge them into
> the salsa repository.
> 
> https://bitbucket.org/tsuchm/pkg-sentencepiece/src/master/



Bug#941569: RFS: sentencepiece/0.1.83+dfsg-1 [ITP] -- Unsupervised text tokenizer for Neural Network-based text generation

2019-10-02 Thread NOKUBI Takatsugu
On Wed, 02 Oct 2019 14:52:23 +0900,
Kentaro Hayashi wrote:
>  * Vcs : https://salsa.debian.org/debian/sentencepiece

It contains tensorflow binding, so I think it will be good to belong
with Debian Science Team.

I, hayashi-san, and tsuchiya-san sent requests to join the team.
tsuchiya-san also maintained it himself, so I'll merge them into
the salsa repository.

https://bitbucket.org/tsuchm/pkg-sentencepiece/src/master/



Bug#941569: RFS: sentencepiece/0.1.83+dfsg-1 [ITP] -- Unsupervised text tokenizer for Neural Network-based text generation

2019-10-02 Thread Kentaro Hayashi
Hi,

Thank you for reviewing. It is very helpful.

On Wed, 2 Oct 2019 23:13:45 +0200 Adam Borowski  wrote:
> On Wed, Oct 02, 2019 at 02:52:23PM +0900, Kentaro Hayashi wrote:
> >  * Package name: sentencepiece
> >Version : 0.1.83+dfsg-1
> >Upstream Author : Google Inc.
> >  * URL : https://github.com/google/sentencepiece
snip

> Hi!
> The runtime library package (ie, libsentencepiece) should have soname
> appended.  This might be not a big change, but amending this later would
> require a trip through NEW, thus let's get it right from the start.
> 
> The watch file also should mangle the version, to include +dfsg.
> 
> It is inappropriate to make lintian overrides for real bugs (like a lack of
> manpage).  You are not required to fix everything immediately -- heck, there
> are many bugs that don't get fixed ever -- but that's not a valid use for
> overrides.  They're for false positives.
> 
> But overall, the package seems almost ready.

I could find a sponsor personally (@knok) and a collabolator(@tsuchm),
so I'll fix above issues with them. Thanks!!!
 
Regards,



Bug#941569: RFS: sentencepiece/0.1.83+dfsg-1 [ITP] -- Unsupervised text tokenizer for Neural Network-based text generation

2019-10-02 Thread Adam Borowski
On Wed, Oct 02, 2019 at 02:52:23PM +0900, Kentaro Hayashi wrote:
>  * Package name: sentencepiece
>Version : 0.1.83+dfsg-1
>Upstream Author : Google Inc.
>  * URL : https://github.com/google/sentencepiece

> It builds those binary packages:
> 
>   libsentencepiece-dev - Development package for libsentencepiece
>   libsentencepiece - Unsupervised text tokenizer for Neural Network-based 
> text generation
>   sentencepiece-tools - Utility package for SentencePiece

Hi!
The runtime library package (ie, libsentencepiece) should have soname
appended.  This might be not a big change, but amending this later would
require a trip through NEW, thus let's get it right from the start.

The watch file also should mangle the version, to include +dfsg.

It is inappropriate to make lintian overrides for real bugs (like a lack of
manpage).  You are not required to fix everything immediately -- heck, there
are many bugs that don't get fixed ever -- but that's not a valid use for
overrides.  They're for false positives.

But overall, the package seems almost ready.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ A MAP07 (Dead Simple) raspberry tincture recipe: 0.5l 95% alcohol,
⣾⠁⢠⠒⠀⣿⡁ 1kg raspberries, 0.4kg sugar; put into a big jar for 1 month.
⢿⡄⠘⠷⠚⠋⠀ Filter out and throw away the fruits (can dump them into a cake,
⠈⠳⣄ etc), let the drink age at least 3-6 months.



Bug#941569: RFS: sentencepiece/0.1.83+dfsg-1 [ITP] -- Unsupervised text tokenizer for Neural Network-based text generation

2019-10-01 Thread Kentaro Hayashi
Package: sponsorship-requests
Severity: wishlist

Dear mentors,

I am looking for a sponsor for my package "sentencepiece"

 * Package name: sentencepiece
   Version : 0.1.83+dfsg-1
   Upstream Author : Google Inc.
 * URL : https://github.com/google/sentencepiece
 * License : Apache-2.0
 * Vcs : https://salsa.debian.org/debian/sentencepiece
   Section : libs

It builds those binary packages:

  libsentencepiece-dev - Development package for libsentencepiece
  libsentencepiece - Unsupervised text tokenizer for Neural Network-based text 
generation
  sentencepiece-tools - Utility package for SentencePiece

To access further information about this package, please visit the following 
URL:

  https://mentors.debian.net/package/sentencepiece

Alternatively, one can download the package with dget using this command:

  dget -x 
https://mentors.debian.net/debian/pool/main/s/sentencepiece/sentencepiece_0.1.83+dfsg-1.dsc

Changes since the last upload:

   * Initial release (Closes: #939860)

Regards,