Re: Natural language parsing (NLP) with D

2015-10-23 Thread Laeeth Isharc via Digitalmars-d

On Tuesday, 20 October 2015 at 16:01:41 UTC, Chris wrote:

If anyone is interested in starting something like FreeLing in 
D, please share your thoughts.


Chris - please drop me a line.  I am sure there are some things 
we could work together on over time.


auto domain="laeeth.com";
auto user="laeeth";
writefln(user~"@"~domain);



Re: Natural language parsing (NLP) with D

2015-10-21 Thread bachmeier via Digitalmars-d

On Wednesday, 21 October 2015 at 06:34:44 UTC, Eliatto wrote:

On Tuesday, 20 October 2015 at 15:49:18 UTC, bachmeier wrote:
It's not my area, but are you thinking of something like 
Freeling?


http://nlp.lsi.upc.edu/freeling/
I think that in order to make a new wrapper more popular, it 
should be created with LGPL license (not GPL). Freeling is GPL.

Is YamCha worth revival in D?
http://chasen.org/~taku/software/yamcha/


The internet doesn't need another discussion about licensing. 
I'll just say that it depends. However, the most important 
factors when you currently have nothing to offer are:


- How complete is the library?
- How many people are using it?
- How easy is it to create the bindings?

From my conversations, Freeling does quite well on all counts. 
Maybe that won't work for you personally because you want to use 
it in a proprietary project. That's not a compelling reason to 
ignore it, though, as others might want to use it, and they may 
be willing to comply with the GPL.


Re: Natural language parsing (NLP) with D

2015-10-21 Thread Eliatto via Digitalmars-d

On Tuesday, 20 October 2015 at 15:49:18 UTC, bachmeier wrote:
It's not my area, but are you thinking of something like 
Freeling?


http://nlp.lsi.upc.edu/freeling/
I think that in order to make a new wrapper more popular, it 
should be created with LGPL license (not GPL). Freeling is GPL.

Is YamCha worth revival in D?
http://chasen.org/~taku/software/yamcha/


Re: Natural language parsing (NLP) with D

2015-10-21 Thread Chris via Digitalmars-d

On Tuesday, 20 October 2015 at 18:43:54 UTC, Laeeth Isharc wrote:


Hi.

I am very interested in this topic (especially sentiment 
analysis), and slowly I am getting a bit more firepower.  I 
started porting the Python version of the stanford NLP API (the 
underlying code is Java) to D - it's not very complicated, but 
I have too much on my plate and so it goes slowly.


I would be interested in working together on this with others, 
and I don't mind open sourcing the building blocks (which is 
really the time consuming bit).  I hope to have some others 
from D world helping me, so it should go a bit faster, although 
the NLP stuff might not be the first project we work on.


Feel free to drop me an email. Laeeth


At kaleidicassociates.com


Thanks.


Laeeth


What exactly is sentiment analysis and how do you go about it?


Re: Natural language parsing (NLP) with D

2015-10-21 Thread Henry Gouk via Digitalmars-d

On Wednesday, 21 October 2015 at 09:09:27 UTC, Chris wrote:


What exactly is sentiment analysis and how do you go about it?


Determining whether the sentiment of a piece of text is positive, 
neutral, or negative. Currently twitter is a pretty popular 
source of data in academia, as emoticons can be used as 
sufficiently accurate proxies for labels. Using psuedo-labelled 
tweets, one can then come up with a feature representation (e.g. 
bag of words, tf-idf) and use some sort of classifier (e.g. 
linear SVM or softmax regression) to determine the sentiment of 
novel tweets. This is a pretty simple approach, and probably not 
hard to improve on.


Re: Natural language parsing (NLP) with D

2015-10-20 Thread Rikki Cattermole via Digitalmars-d

On 21/10/15 1:01 AM, Eliatto wrote:

Hello! I am rather new to D ecosystem (I am a C++ developer). I know
that there are code-dlang and awesome-D collections of libraries. But I
have not found any NLP libraries in D
(https://github.com/jogojapan/drulex is not worth mentioning), though
there are Go and Rust NLP libraries on github (they are new languages
too). Why is this field unpopular among (D)evelopers?
What can be used for base POS tagging and NP chunking of English texts
instead? I mean wrapping some C/C++ library without porting. Which one
will cause minimal headache during glueing with D?
P.S. I suppose that it will be nice to see the histogram of libs using
"awesome-D" list. For example, one rectangle shows 3D engine
percentage(libs number divided by total awesome-D libs count and
multiplied by 100), another shows logger libs percentage...


The only thing I could find on code.dlang.org was 
https://github.com/Herringway/natcmp

Not really what you want I think.

In terms of binding c/c++ to D, you should be able to do it almost whole 
sale. You will need to create shims on C++'s side for certain features 
such as operator overloads, templates and of course creation.
At least as of what I know. There was some serious C++ improvement 
fairly recently (DDMD) so somebody else will need to confirm about it.


As for which C/C++ library you should base off of? Well no idea, what 
would you like to use?


Also this would be better suited for D.learn.


Natural language parsing (NLP) with D

2015-10-20 Thread Eliatto via Digitalmars-d
Hello! I am rather new to D ecosystem (I am a C++ developer). I 
know that there are code-dlang and awesome-D collections of 
libraries. But I have not found any NLP libraries in D 
(https://github.com/jogojapan/drulex is not worth mentioning), 
though there are Go and Rust NLP libraries on github (they are 
new languages too). Why is this field unpopular among 
(D)evelopers?
What can be used for base POS tagging and NP chunking of English 
texts instead? I mean wrapping some C/C++ library without 
porting. Which one will cause minimal headache during glueing 
with D?
P.S. I suppose that it will be nice to see the histogram of libs 
using "awesome-D" list. For example, one rectangle shows 3D 
engine percentage(libs number divided by total awesome-D libs 
count and multiplied by 100), another shows logger libs 
percentage...


Re: Natural language parsing (NLP) with D

2015-10-20 Thread ponce via Digitalmars-d

On Tuesday, 20 October 2015 at 12:01:44 UTC, Eliatto wrote:

Why is this field unpopular among (D)evelopers?


We aren't numerous, so there hasn't been anyone to tackle the NLP 
problems now (and many other domains). There is plenty of space 
to start domain-specific libraries.

You could do it :)


Re: Natural language parsing (NLP) with D

2015-10-20 Thread Laeeth Isharc via Digitalmars-d

On Tuesday, 20 October 2015 at 16:01:41 UTC, Chris wrote:

On Tuesday, 20 October 2015 at 15:49:18 UTC, bachmeier wrote:
It's not my area, but are you thinking of something like 
Freeling?


http://nlp.lsi.upc.edu/freeling/

Asking for a friend. I think a C++ expert could get it to work 
with D with little difficulty, at least by creating C 
bindings, but I'm not a C++ expert and I failed.


Interesting, I heard of it a while ago. In D I have the 
following:


Text tokenization

Yes.

Sentence splitting

Yes.

Morphological analysis

Yes.

Suffix treatment [, retokenization of clitic pronouns]

Yes.

Flexible multiword recognition

Yes.

Contraction splitting

Depends on what they mean. But I can handle contractions like 
"l'ami".


Probabilistic prediction of unkown word categories

No.

Phonetic encoding

Transcription? If so, yes.

SED-based search for similar words in dictionary

No.

Named entity detection

No.

Recognition of dates, numbers, ratios, currency, and physical 
magnitudes (speed, weight, temperature, density, etc.)


Partially implemented.

PoS tagging

Started.

Chart-based shallow parsing

No.

Named entity classification

No.

WordNet-based sense annotation and disambiguation

No.

Rule-based dependency parsing

No.

Nominal correference resolution

No.

If anyone is interested in starting something like FreeLing in 
D, please share your thoughts.


Hi.

I am very interested in this topic (especially sentiment 
analysis), and slowly I am getting a bit more firepower.  I 
started porting the Python version of the stanford NLP API (the 
underlying code is Java) to D - it's not very complicated, but I 
have too much on my plate and so it goes slowly.


I would be interested in working together on this with others, 
and I don't mind open sourcing the building blocks (which is 
really the time consuming bit).  I hope to have some others from 
D world helping me, so it should go a bit faster, although the 
NLP stuff might not be the first project we work on.


Feel free to drop me an email. Laeeth


At kaleidicassociates.com


Thanks.


Laeeth



Re: Natural language parsing (NLP) with D

2015-10-20 Thread bachmeier via Digitalmars-d

On Tuesday, 20 October 2015 at 12:01:44 UTC, Eliatto wrote:
Hello! I am rather new to D ecosystem (I am a C++ developer). I 
know that there are code-dlang and awesome-D collections of 
libraries. But I have not found any NLP libraries in D 
(https://github.com/jogojapan/drulex is not worth mentioning), 
though there are Go and Rust NLP libraries on github (they are 
new languages too). Why is this field unpopular among 
(D)evelopers?
What can be used for base POS tagging and NP chunking of 
English texts instead? I mean wrapping some C/C++ library 
without porting. Which one will cause minimal headache during 
glueing with D?
P.S. I suppose that it will be nice to see the histogram of 
libs using "awesome-D" list. For example, one rectangle shows 
3D engine percentage(libs number divided by total awesome-D 
libs count and multiplied by 100), another shows logger libs 
percentage...


It's not my area, but are you thinking of something like Freeling?

http://nlp.lsi.upc.edu/freeling/

Asking for a friend. I think a C++ expert could get it to work 
with D with little difficulty, at least by creating C bindings, 
but I'm not a C++ expert and I failed.


Re: Natural language parsing (NLP) with D

2015-10-20 Thread Chris via Digitalmars-d

On Tuesday, 20 October 2015 at 15:49:18 UTC, bachmeier wrote:
It's not my area, but are you thinking of something like 
Freeling?


http://nlp.lsi.upc.edu/freeling/

Asking for a friend. I think a C++ expert could get it to work 
with D with little difficulty, at least by creating C bindings, 
but I'm not a C++ expert and I failed.


Interesting, I heard of it a while ago. In D I have the following:

Text tokenization

Yes.

Sentence splitting

Yes.

Morphological analysis

Yes.

Suffix treatment [, retokenization of clitic pronouns]

Yes.

Flexible multiword recognition

Yes.

Contraction splitting

Depends on what they mean. But I can handle contractions like 
"l'ami".


Probabilistic prediction of unkown word categories

No.

Phonetic encoding

Transcription? If so, yes.

SED-based search for similar words in dictionary

No.

Named entity detection

No.

Recognition of dates, numbers, ratios, currency, and physical 
magnitudes (speed, weight, temperature, density, etc.)


Partially implemented.

PoS tagging

Started.

Chart-based shallow parsing

No.

Named entity classification

No.

WordNet-based sense annotation and disambiguation

No.

Rule-based dependency parsing

No.

Nominal correference resolution

No.

If anyone is interested in starting something like FreeLing in D, 
please share your thoughts.


Re: Natural language parsing (NLP) with D

2015-10-20 Thread Andrei Alexandrescu via Digitalmars-d

On 10/20/2015 08:01 AM, Eliatto wrote:

Hello! I am rather new to D ecosystem (I am a C++ developer). I know
that there are code-dlang and awesome-D collections of libraries. But I
have not found any NLP libraries in D
(https://github.com/jogojapan/drulex is not worth mentioning), though
there are Go and Rust NLP libraries on github (they are new languages
too). Why is this field unpopular among (D)evelopers?
What can be used for base POS tagging and NP chunking of English texts
instead? I mean wrapping some C/C++ library without porting. Which one
will cause minimal headache during glueing with D?
P.S. I suppose that it will be nice to see the histogram of libs using
"awesome-D" list. For example, one rectangle shows 3D engine
percentage(libs number divided by total awesome-D libs count and
multiplied by 100), another shows logger libs percentage...


In my NLP days I remember the common procedure was to run 
taggers/chunkers/etc as processes driven by scripts. That said, a 
library offers more options and it would be interesting to see such in 
code.dlang.org. -- Andrei


Re: Natural language parsing (NLP) with D

2015-10-20 Thread Chris via Digitalmars-d

On Tuesday, 20 October 2015 at 12:01:44 UTC, Eliatto wrote:

Why is this field unpopular among (D)evelopers?


I work with NLP almost all the time and D is very well suited for 
it. It's mainly text-to-speech stuff, but I have a tiny POS 
tagger (or rather POS identifier) as well.


D would be well suited for creating higher, simpler rule 
languages that linguists who have no clue about programming could 
easily use. I've been thinking about this for a while now, and I 
wish I had the time to come up with something and implement it. 
I'm thinking of a suite that would cater for the various aspects 
of NLP, e.g. phonemic transcriptions, POS tagging, morphological 
and grammatical analysis, collocation etc. A one stop shop for 
linguists. But, alas, time is scarce.


If you have any ideas, please share.