Re: Is there any sentence tokenizers in sold 4.9.0?

2014-09-12 Thread Benson Margulies
Basis Technology's toolset includes sentence boundary detectors. Please
contact me for more details.

On Fri, Sep 12, 2014 at 1:15 AM, Sandeep B A belgavi.sand...@gmail.com
wrote:

 Hi All,
 Sorry for the delayed response.
 I was out of office for last few days and was not able to reply.
 Thanks for the information.

 We have a use case were one sentence is the unit token with which we need
 to do normalization and semantic analyzer.

 We need to finalize on the type of normalizer and analyzer but was trying
 to view if solr has any inbuilt libraries, so that no cross language
 integration might be required.

 Again Wil get back if something works or not works.

 @susheel,
 Thanks will try to see if that works.

 Thanks,
 Sandeep.
 On Sep 8, 2014 12:54 PM, Sandeep B A belgavi.sand...@gmail.com wrote:

  Hi Susheel ,
  Thanks for the information.
  I have crawled few website and all I need is for sentence tokenizers on
  the data I have collected.
  These websites are English only.
 
  Well I don't have experience in writing custom sentence tokenizers for
  solr. Is there any tutorial link which tell how to do it?
 
  Is it possible to integrate nltk for solr? If yes how to do it? Because I
  found sentence tokenizers for English in nltk.
 
  Thanks,
  Sandeep
  On Sep 5, 2014 8:10 PM, Sandeep B A belgavi.sand...@gmail.com wrote:
 
  Sorry for typo it is solr 4.9.0 instead of sold 4.9.0
   On Sep 5, 2014 7:48 PM, Sandeep B A belgavi.sand...@gmail.com
 wrote:
 
  Hi,
 
  I was looking out the options for sentence tokenizers default in solr
  but could not find it. Does any one used? Integrated from any other
  language tokenizers to solr. Example python etc.. Please let me know.
 
 
  Thanks and regards,
  Sandeep
 
 



Re: Is there any sentence tokenizers in sold 4.9.0?

2014-09-12 Thread Aman Tandon
Hi,

Is there any semantic analyzer in solr?
On Sep 12, 2014 10:51 AM, Sandeep B A belgavi.sand...@gmail.com wrote:

 Hi All,
 Sorry for the delayed response.
 I was out of office for last few days and was not able to reply.
 Thanks for the information.

 We have a use case were one sentence is the unit token with which we need
 to do normalization and semantic analyzer.

 We need to finalize on the type of normalizer and analyzer but was trying
 to view if solr has any inbuilt libraries, so that no cross language
 integration might be required.

 Again Wil get back if something works or not works.

 @susheel,
 Thanks will try to see if that works.

 Thanks,
 Sandeep.
 On Sep 8, 2014 12:54 PM, Sandeep B A belgavi.sand...@gmail.com wrote:

  Hi Susheel ,
  Thanks for the information.
  I have crawled few website and all I need is for sentence tokenizers on
  the data I have collected.
  These websites are English only.
 
  Well I don't have experience in writing custom sentence tokenizers for
  solr. Is there any tutorial link which tell how to do it?
 
  Is it possible to integrate nltk for solr? If yes how to do it? Because I
  found sentence tokenizers for English in nltk.
 
  Thanks,
  Sandeep
  On Sep 5, 2014 8:10 PM, Sandeep B A belgavi.sand...@gmail.com wrote:
 
  Sorry for typo it is solr 4.9.0 instead of sold 4.9.0
   On Sep 5, 2014 7:48 PM, Sandeep B A belgavi.sand...@gmail.com
 wrote:
 
  Hi,
 
  I was looking out the options for sentence tokenizers default in solr
  but could not find it. Does any one used? Integrated from any other
  language tokenizers to solr. Example python etc.. Please let me know.
 
 
  Thanks and regards,
  Sandeep
 
 



RE: Is there any sentence tokenizers in sold 4.9.0?

2014-09-11 Thread Susheel Kumar
Just as an FYI, You may want to try Sentence Detection Tokenizer added as 
OpenNLP capabilities to Solr 4.9

https://issues.apache.org/jira/browse/LUCENE-2899

-Original Message-
From: Susheel Kumar [mailto:susheel.ku...@thedigitalgroup.net]
Sent: Monday, September 08, 2014 8:29 PM
To: solr-user@lucene.apache.org
Subject: RE: Is there any sentence tokenizers in sold 4.9.0?

Sandeep,

As Jack mentioned it will be useful to know the use case/what kind of query you 
will be executing as you may also need to handle on query side not just on 
indexing side.  For integrating with nltk there could be different options like 
calling ntlk as out of proc or use jythonc to generate java classes.

Thnx

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Monday, September 08, 2014 7:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Is there any sentence tokenizers in sold 4.9.0?

Out of curiosity, what would be an example query for your application that 
would depend on sentence tokenization, as opposed to simple term tokenization? 
I mean, there are no sentence-based query operators in the Solr query parsers.

-- Jack Krupansky

-Original Message-
From: Sandeep B A
Sent: Monday, September 8, 2014 12:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Is there any sentence tokenizers in sold 4.9.0?

Hi Susheel ,
Thanks for the information.
I have crawled few website and all I need is for sentence tokenizers on the 
data I have collected.
These websites are English only.

Well I don't have experience in writing custom sentence tokenizers for solr. Is 
there any tutorial link which tell how to do it?

Is it possible to integrate nltk for solr? If yes how to do it? Because I found 
sentence tokenizers for English in nltk.

Thanks,
Sandeep
On Sep 5, 2014 8:10 PM, Sandeep B A belgavi.sand...@gmail.com wrote:

 Sorry for typo it is solr 4.9.0 instead of sold 4.9.0  On Sep 5, 2014
 7:48 PM, Sandeep B A belgavi.sand...@gmail.com wrote:

 Hi,

 I was looking out the options for sentence tokenizers default in solr
 but could not find it. Does any one used? Integrated from any other
 language tokenizers to solr. Example python etc.. Please let me know.


 Thanks and regards,
 Sandeep



This e-mail message may contain confidential or legally privileged information 
and is intended only for the use of the intended recipient(s). Any unauthorized 
disclosure, dissemination, distribution, copying or the taking of any action in 
reliance on the information herein is prohibited. E-mails are not secure and 
cannot be guaranteed to be error free as they can be intercepted, amended, or 
contain viruses. Anyone who communicates with us by e-mail is deemed to have 
accepted these risks. The Digital Group is not responsible for errors or 
omissions in this message and denies any responsibility for any damage arising 
from the use of e-mail. Any opinion defamatory or deemed to be defamatory or  
any material which could be reasonably branded to be a species of plagiarism 
and other statements contained in this message and any attachment are solely 
those of the author and do not necessarily represent those of the company.
This e-mail message may contain confidential or legally privileged information 
and is intended only for the use of the intended recipient(s). Any unauthorized 
disclosure, dissemination, distribution, copying or the taking of any action in 
reliance on the information herein is prohibited. E-mails are not secure and 
cannot be guaranteed to be error free as they can be intercepted, amended, or 
contain viruses. Anyone who communicates with us by e-mail is deemed to have 
accepted these risks. The Digital Group is not responsible for errors or 
omissions in this message and denies any responsibility for any damage arising 
from the use of e-mail. Any opinion defamatory or deemed to be defamatory or  
any material which could be reasonably branded to be a species of plagiarism 
and other statements contained in this message and any attachment are solely 
those of the author and do not necessarily represent those of the company.


Re: Is there any sentence tokenizers in sold 4.9.0?

2014-09-11 Thread Sandeep B A
Hi All,
Sorry for the delayed response.
I was out of office for last few days and was not able to reply.
Thanks for the information.

We have a use case were one sentence is the unit token with which we need
to do normalization and semantic analyzer.

We need to finalize on the type of normalizer and analyzer but was trying
to view if solr has any inbuilt libraries, so that no cross language
integration might be required.

Again Wil get back if something works or not works.

@susheel,
Thanks will try to see if that works.

Thanks,
Sandeep.
On Sep 8, 2014 12:54 PM, Sandeep B A belgavi.sand...@gmail.com wrote:

 Hi Susheel ,
 Thanks for the information.
 I have crawled few website and all I need is for sentence tokenizers on
 the data I have collected.
 These websites are English only.

 Well I don't have experience in writing custom sentence tokenizers for
 solr. Is there any tutorial link which tell how to do it?

 Is it possible to integrate nltk for solr? If yes how to do it? Because I
 found sentence tokenizers for English in nltk.

 Thanks,
 Sandeep
 On Sep 5, 2014 8:10 PM, Sandeep B A belgavi.sand...@gmail.com wrote:

 Sorry for typo it is solr 4.9.0 instead of sold 4.9.0
  On Sep 5, 2014 7:48 PM, Sandeep B A belgavi.sand...@gmail.com wrote:

 Hi,

 I was looking out the options for sentence tokenizers default in solr
 but could not find it. Does any one used? Integrated from any other
 language tokenizers to solr. Example python etc.. Please let me know.


 Thanks and regards,
 Sandeep




Re: Is there any sentence tokenizers in sold 4.9.0?

2014-09-08 Thread Sandeep B A
Hi Susheel ,
Thanks for the information.
I have crawled few website and all I need is for sentence tokenizers on the
data I have collected.
These websites are English only.

Well I don't have experience in writing custom sentence tokenizers for
solr. Is there any tutorial link which tell how to do it?

Is it possible to integrate nltk for solr? If yes how to do it? Because I
found sentence tokenizers for English in nltk.

Thanks,
Sandeep
On Sep 5, 2014 8:10 PM, Sandeep B A belgavi.sand...@gmail.com wrote:

 Sorry for typo it is solr 4.9.0 instead of sold 4.9.0
  On Sep 5, 2014 7:48 PM, Sandeep B A belgavi.sand...@gmail.com wrote:

 Hi,

 I was looking out the options for sentence tokenizers default in solr but
 could not find it. Does any one used? Integrated from any other language
 tokenizers to solr. Example python etc.. Please let me know.


 Thanks and regards,
 Sandeep




Re: Is there any sentence tokenizers in sold 4.9.0?

2014-09-08 Thread Jack Krupansky
Out of curiosity, what would be an example query for your application that 
would depend on sentence tokenization, as opposed to simple term 
tokenization? I mean, there are no sentence-based query operators in the 
Solr query parsers.


-- Jack Krupansky

-Original Message- 
From: Sandeep B A

Sent: Monday, September 8, 2014 12:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Is there any sentence tokenizers in sold 4.9.0?

Hi Susheel ,
Thanks for the information.
I have crawled few website and all I need is for sentence tokenizers on the
data I have collected.
These websites are English only.

Well I don't have experience in writing custom sentence tokenizers for
solr. Is there any tutorial link which tell how to do it?

Is it possible to integrate nltk for solr? If yes how to do it? Because I
found sentence tokenizers for English in nltk.

Thanks,
Sandeep
On Sep 5, 2014 8:10 PM, Sandeep B A belgavi.sand...@gmail.com wrote:


Sorry for typo it is solr 4.9.0 instead of sold 4.9.0
 On Sep 5, 2014 7:48 PM, Sandeep B A belgavi.sand...@gmail.com wrote:


Hi,

I was looking out the options for sentence tokenizers default in solr but
could not find it. Does any one used? Integrated from any other language
tokenizers to solr. Example python etc.. Please let me know.


Thanks and regards,
Sandeep







RE: Is there any sentence tokenizers in sold 4.9.0?

2014-09-08 Thread Susheel Kumar
Sandeep,

As Jack mentioned it will be useful to know the use case/what kind of query you 
will be executing as you may also need to handle on query side not just on 
indexing side.  For integrating with nltk there could be different options like 
calling ntlk as out of proc or use jythonc to generate java classes.

Thnx

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Monday, September 08, 2014 7:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Is there any sentence tokenizers in sold 4.9.0?

Out of curiosity, what would be an example query for your application that 
would depend on sentence tokenization, as opposed to simple term tokenization? 
I mean, there are no sentence-based query operators in the Solr query parsers.

-- Jack Krupansky

-Original Message-
From: Sandeep B A
Sent: Monday, September 8, 2014 12:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Is there any sentence tokenizers in sold 4.9.0?

Hi Susheel ,
Thanks for the information.
I have crawled few website and all I need is for sentence tokenizers on the 
data I have collected.
These websites are English only.

Well I don't have experience in writing custom sentence tokenizers for solr. Is 
there any tutorial link which tell how to do it?

Is it possible to integrate nltk for solr? If yes how to do it? Because I found 
sentence tokenizers for English in nltk.

Thanks,
Sandeep
On Sep 5, 2014 8:10 PM, Sandeep B A belgavi.sand...@gmail.com wrote:

 Sorry for typo it is solr 4.9.0 instead of sold 4.9.0  On Sep 5, 2014
 7:48 PM, Sandeep B A belgavi.sand...@gmail.com wrote:

 Hi,

 I was looking out the options for sentence tokenizers default in solr
 but could not find it. Does any one used? Integrated from any other
 language tokenizers to solr. Example python etc.. Please let me know.


 Thanks and regards,
 Sandeep



This e-mail message may contain confidential or legally privileged information 
and is intended only for the use of the intended recipient(s). Any unauthorized 
disclosure, dissemination, distribution, copying or the taking of any action in 
reliance on the information herein is prohibited. E-mails are not secure and 
cannot be guaranteed to be error free as they can be intercepted, amended, or 
contain viruses. Anyone who communicates with us by e-mail is deemed to have 
accepted these risks. The Digital Group is not responsible for errors or 
omissions in this message and denies any responsibility for any damage arising 
from the use of e-mail. Any opinion defamatory or deemed to be defamatory or  
any material which could be reasonably branded to be a species of plagiarism 
and other statements contained in this message and any attachment are solely 
those of the author and do not necessarily represent those of the company.


Is there any sentence tokenizers in sold 4.9.0?

2014-09-05 Thread Sandeep B A
Hi,

I was looking out the options for sentence tokenizers default in solr but
could not find it. Does any one used? Integrated from any other language
tokenizers to solr. Example python etc.. Please let me know.


Thanks and regards,
Sandeep


Re: Is there any sentence tokenizers in sold 4.9.0?

2014-09-05 Thread Sandeep B A
Sorry for typo it is solr 4.9.0 instead of sold 4.9.0
 On Sep 5, 2014 7:48 PM, Sandeep B A belgavi.sand...@gmail.com wrote:

 Hi,

 I was looking out the options for sentence tokenizers default in solr but
 could not find it. Does any one used? Integrated from any other language
 tokenizers to solr. Example python etc.. Please let me know.


 Thanks and regards,
 Sandeep



RE: Is there any sentence tokenizers in sold 4.9.0?

2014-09-05 Thread Susheel Kumar
There is SmartChineseSentenceTokenizerFactory or SentenceTokenizer  which is 
getting being deprecated  replaced with HMMChineseTokenizer.  Not aware of 
other tokenizer but you may to either build your own similar to 
SentenceTokenizer or employ any external Sentence detection/recognizer  built 
Solr tokenizer on top of it.

Don't know how complex your use case is but I would suggest to look 
SentenceTokenizer and create similar tokenizer.

Thanks,
Susheel

-Original Message-
From: Sandeep B A [mailto:belgavi.sand...@gmail.com]
Sent: Friday, September 05, 2014 10:40 AM
To: solr-user@lucene.apache.org
Subject: Re: Is there any sentence tokenizers in sold 4.9.0?

Sorry for typo it is solr 4.9.0 instead of sold 4.9.0  On Sep 5, 2014 7:48 PM, 
Sandeep B A belgavi.sand...@gmail.com wrote:

 Hi,

 I was looking out the options for sentence tokenizers default in solr
 but could not find it. Does any one used? Integrated from any other
 language tokenizers to solr. Example python etc.. Please let me know.


 Thanks and regards,
 Sandeep

This e-mail message may contain confidential or legally privileged information 
and is intended only for the use of the intended recipient(s). Any unauthorized 
disclosure, dissemination, distribution, copying or the taking of any action in 
reliance on the information herein is prohibited. E-mails are not secure and 
cannot be guaranteed to be error free as they can be intercepted, amended, or 
contain viruses. Anyone who communicates with us by e-mail is deemed to have 
accepted these risks. The Digital Group is not responsible for errors or 
omissions in this message and denies any responsibility for any damage arising 
from the use of e-mail. Any opinion defamatory or deemed to be defamatory or  
any material which could be reasonably branded to be a species of plagiarism 
and other statements contained in this message and any attachment are solely 
those of the author and do not necessarily represent those of the company.