indexing unique keys

2014-09-04 Thread Mark , N
I have a use-case where we want to store unique keys ( Hashes)  which would
be
used to compare against another set of  keys ( Hashes)

For example

 Index  set= { h1, h2 , h3 , h4 }

comparision set = { h1 , h2 }

result set = h1,h2

Would it be an advantage to store "index set" in  Solr instead of storing
in traditional databases?

Thanks in advance






*Nipen Mark *


search hit on multivalued fields

2012-08-03 Thread Mark , N
I have a multivalued field  "Tex" which is indexed , for example :

F1:  some value
F2: some value
Text = ( content of f1,f2)

When user search , I am checking only a  "Text" field but i would also need
to display to users which Field ( F1 or F2 )  resulted the search hit
Is it possible in SOLR  ?


-- 
Thanks,

*Nipen Mark *


Re: filtering number and repeated contents

2012-06-07 Thread Mark , N
thanks Jack  , I will try updateProcessor

Between does SOLR store tokenized "content" in fields if field have
property stored="true" ?







On Tue, Jun 5, 2012 at 8:23 PM, Jack Krupansky wrote:

> My (very limited) understanding of "boilerpipe" in Tika is that it strips
> out "short text", which is great for all the menu and navigation text, but
> the typical disclaimer at the bottom of an email is not very short and
> frequently can be longer than the email message body itself. You may have
> to resort to a custom update processor that is programmed with some
> disclaimer signature text strings to be removed from field values.
>
> -- Jack Krupansky
>
> -Original Message- From: Mark , N
> Sent: Tuesday, June 05, 2012 8:28 AM
> To: solr-user@lucene.apache.org
> Subject: filtering number and repeated contents
>
>
> Is it possible to filter out numbers and disclaimer ( repeated contents)
> while indexing to SOLR?
> These are all surplus information and do not want to index it
>
> I have tried using boilerpipe algorithm as well to remove surplus
> infromation from web pages such as navigational elements, templates, and
> advertisements , I think it works well but looking forward to see If I
> could filter out  "disclaimer" information too mainly in email texts.
> --
> Thanks,
>
> *Nipen Mark *
>



-- 
Thanks,

*Nipen Mark *


filtering number and repeated contents

2012-06-05 Thread Mark , N
Is it possible to filter out numbers and disclaimer ( repeated contents)
while indexing to SOLR?
These are all surplus information and do not want to index it

I have tried using boilerpipe algorithm as well to remove surplus
infromation from web pages such as navigational elements, templates, and
advertisements , I think it works well but looking forward to see If I
could filter out  "disclaimer" information too mainly in email texts.
-- 
Thanks,

*Nipen Mark *


filtering footer information

2012-05-23 Thread Mark , N
Is it possible to filter certain repeated  footer information from text
documents while indexing to solr ?

Are there any built-in filters similar to stop word filters ?




-- 
Thanks,

*Nipen Mark *


Re: wildcard and proximity searches

2010-10-05 Thread Mark N
Thanks ahmet

Is it also possible to search the document having a  field ENDING with
"week*"

query should return documents with a field ending with  week and its
derivatives such as weekly,weeks

So above query should return

"this week"
"Past three weeks"
"Report weekly"

thanks
chandan



On Tue, Oct 5, 2010 at 5:04 PM, Ahmet Arslan  wrote:

> > Also does this plugin allow us to use proximity with wild
> > card
> > *  "solr mail*"~10 *
> >
>
> Yes it supports "solr mail*"~10 kind of queries without any problem.
>
> Currently it throws exception with "mail*" kind of queries, but they are
> not valid phrase queries. Because there is only one clause inside quotation
> marks.
>
>
>
>


-- 
Nipen Mark


Re: wildcard and proximity searches

2010-10-05 Thread Mark N
Hi

were you successful in trying SOLR -1604  to allow wild card queries in
phrases ?

Also does this plugin allow us to use proximity with wild card
*  "solr mail*"~10 *

If this the right approach to go ahead to support these functionalities?

thanks
Mark





On Wed, Aug 4, 2010 at 2:24 PM, Frederico Azeiteiro <
frederico.azeite...@cision.com> wrote:

> Thanks for you ideia.
>
> At this point I'm logging each query time. My ideia is to divide my
> queries into "normal queries" and "heavy queries". I have some heavy
> queries with 1 minute or 2mintes to get results. But they have for
> instance (*word1* AND *word2* AND word3*). I guess that this will be
> always slower (could be a little faster with
> "ReversedWildcardFilterFactory") but they never be ready in a few
> seconds. For now, I just increased the timeout for those :) (using
> solrnet).
>
> My priority at the moment is the queries phrases like "word1* word2*
> word3". After this is working, I'll try to optimize the "heavy queries"
>
> Frederico
>
>
> -Original Message-
> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
> Sent: quarta-feira, 4 de Agosto de 2010 01:41
> To: solr-user@lucene.apache.org
> Subject: Re: wildcard and proximity searches
>
> Frederico Azeiteiro wrote:
> >
> >>> But it is unusual to use both leading and trailing * operator. Why
> are
> >>>
> > you doing this?
> >
> > Yes I know, but I have a few queries that need this. I'll try the
> > "ReversedWildcardFilterFactory".
> >
> >
> >
>
> ReverseWildcardFilter will help leading wildcard, but will not help
> trying to use a query with BOTH leading and trailing wildcard. it'll
> still be slow. Solr/lucene isn't good at that; I didn't even know Solr
> would do it at all in fact.
>
> If you really needed to do that, the way to play to solr/lucene's way of
>
> doing things, would be to have a field where you actually index each
> _character_ as a seperate token. Then leading and trailing wildcard
> search is basically reduced to a "phrase search", but where the words
> are actually characters.   But then you're going to get an index where
> pretty much every token belongs to every document, which Solr isn't that
>
> great at either, but then you can apply "commongram" stuff on top to
> help that out a lot too. Not quite sure what the end result will be,
> I've never tried it.  I'd only use that weird special "char as token"
> field for queries that actually required leading and trailing wildcards.
>
> Figuring out how to set up your analyzers, and what (if anything) you're
>
> going to have to do client-app-side to transform the user's query into
> something that'll end up searching like a "phrase search where each
> 'word' is a character is left as an exersize for the reader. :)
>
> Jonathan
>



-- 
Nipen Mark


Re: question on wild card

2010-07-15 Thread Mark N
thanks erick .

One more question

 when "the perfect world*" is passed as search query its converted as   "?
perfect world"  what does "?" mean

 Since i am using standard analyzer  i thought  stop word "the" is removed

thanks


On Thu, Jul 15, 2010 at 7:01 AM, Erick Erickson wrote:

> The best way to understand how things are parsed is to go to the solr admin
> page (Full interface link?) and click the "debug info" box and submit your
> query. That'll tell you exactly what happens.
>
> Alternatively, you can put &debugQuery=on on your URL...
>
> HTH
> Erick
>
> On Wed, Jul 14, 2010 at 8:48 AM, Mark N  wrote:
>
> > I have a database field  = hello world and i am indexing to *text* field
> > with standard analyzer ( text is a copy field of solr)
> >
> > Now when user  gives a query   text:"hello world%"  , how does the query
> is
> > interpreted in the background
> >
> > are we actually searchingtext: hello OR  text: world%( consider
> by
> > default operator is OR )
> >
> >
> >
> >
> >
> >
> > --
> > Nipen Mark
> >
>



-- 
Nipen Mark


question on wild card

2010-07-14 Thread Mark N
I have a database field  = hello world and i am indexing to *text* field
with standard analyzer ( text is a copy field of solr)

Now when user  gives a query   text:"hello world%"  , how does the query is
interpreted in the background

are we actually searchingtext: hello OR  text: world%( consider by
default operator is OR )






-- 
Nipen Mark


Two analyzer per field

2010-07-12 Thread Mark N
Is it possible to specify two analyzers per fields

for example , consider a field  *F1  *( keyword analyzer) = "cheers mate"
*F2 *(keyword analyzer ) =
"hello world"

There is also a copy field  *TEXT *( standard analyzer )   which will store
the  terms  { cheers mate hello world }

now when user perform any search we will be looking at copy field "TEXT"
only which uses standard analyzer . Suppose user search "hello word"  phrase
it will not return any result
as "hello" and "world" terms are tokenized .

is it possible that I index "hello world" as it is as well in to
*TEXT*field ? i.e can I use keyword analyzer as well and standard
analyzer for
field "TEXT"
what should be better approach to handle this situation ?





-- 
Nipen Mark


Solr DataImportHandler

2010-04-08 Thread Mark N
Is it possible to use solr DataImportHandler when that database fields are
not fixed ?  As per my findings we need to configure which table ( entity)
we will read the data and must match which fields in database will map to
fields in solr schema

Since in my case database fields could be dynamic , can DIH be helpful ?

please suggest


-- 
Nipen Mark


indexing a huge data

2010-03-05 Thread Mark N
what should be the fastest way to index a documents , I am indexing huge
collection of data after extracting certain meta - data information
for example author and filename of each files

i am extracting these information and storing in XML format

for example :1abc 
abc.doc
  2abc 
abc1.doc

I can not index these documents directly to solr as it is not in the format
required by solr ( i can not change the format as its used in other modules)

should converting these file to CSV will be better and faster approach
compared to XML?



please  suggest




-- 
Nipen Mark


Re: Getting max/min dates from solr index

2010-02-16 Thread Mark N
thanks .
Is it possible to do date faceting on multiple solr shards?

I am using index created in two different shards to do date faceting on
field "DATE"

*
http://localhost:8983/solr/1_13_1_3/select?&shards=localhost:8983/solr/index1/,localhost_two:8983/solr/index/&start=0&rows=20&q=*&facet=true&facet.date=DATE&facet.date.start=2004-01-01T00:00:00Z&facet.date.end=2011-01-01T00:00:00Z&facet.date.gap=%2B1YEAR
*




On Fri, Feb 12, 2010 at 3:39 AM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> Mark,
>
> Yes, facets will give you that information. Min/max StatsComponent?
>  See http://www.search-lucene.com/?q=StatsComponent
>
>  Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>
>
> - Original Message 
> > From: Mark N 
> > To: solr-user@lucene.apache.org
> > Sent: Wed, February 10, 2010 8:12:43 AM
> > Subject: Getting max/min dates from solr index
> >
> > How can we get the max and min date from the Solr index ? I would need
> these
> > dates to draw a graph ( for example timeline graph )
> >
> >
> > Also can we use date faceting to show how many documents are indexed
> every
> > month  .
> > Consider I need to draw a timeline graph for current year to show how
> many
> > records are indexed for every month  .So i will have months in X axis and
> no
> > of document in Y axis.
> >
> > What should be the better approach to design a schema to achieve this
> > functionality ?
> >
> >
> > Any suggestions would be appreciated
> >
> > thanks
> >
> >
> > --
> > Nipen Mark
>
>


-- 
Nipen Mark


Getting max/min dates from solr index

2010-02-10 Thread Mark N
How can we get the max and min date from the Solr index ? I would need these
dates to draw a graph ( for example timeline graph )


Also can we use date faceting to show how many documents are indexed every
month  .
Consider I need to draw a timeline graph for current year to show how many
records are indexed for every month  .So i will have months in X axis and no
of document in Y axis.

What should be the better approach to design a schema to achieve this
functionality ?


Any suggestions would be appreciated

thanks


-- 
Nipen Mark


solr updateCSV

2010-01-07 Thread Mark N
I am trying to use solr's csv updater to index the data , i am tryin to
specify the .Dat format consisting of field seperator , text qualifier and a
line seperator

for example

field 1 < field separator>  field 2
value for field 1value for field 2 

Can we specify text qualifier and line seperator as well ?

I have tested that we can specify a seperator and works good.



-- 
Nipen Mark


Indexing large text documents

2010-01-05 Thread Mark N
SolrInputDocument doc1 = new SolrInputDocument();
 doc1.addField( "Fulltext", strContent);

strContent is a string variable which  contains  contents of  text file.
( assume that text file is located in c:\files\abc.txt )

In my case abc.text  ( text files ) could be very huge ~ 2 GB so it is not
always possible to read and store them into string variables
before indexing . Can anyone suggest what should be better approach to index
these huge text files ?



-- 
Nipen Mark


Enumerating wildcard terms

2009-12-08 Thread Mark N
Is it possible to  enumerate all terms that match the specified wildcard
filter term.  Similar to Lunce  WildCardTermEnum API

for example if I search abc*   then I just should able to access all the
terms abc1, abc2 , abc3... that exists in Index

What should be better approach to meet this functionality ?




-- 
Nipen Mark


Re: nested solr queries

2009-11-30 Thread Mark N
thanks for your help so do you think I should execute solr queries twice ?
or is there any other workarounds




On Mon, Nov 30, 2009 at 3:07 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Nov 30, 2009 at 2:26 PM, Mark N  wrote:
>
> > field2="xyz" we dont know until we run query1
> >
> >
> Ah, ok. I thought xyz was a literal that you wanted to search.
>
>
> > To simply i was actually trying to do some kind of JOIN similar to
> > following
> > SQL query
> >
> >
> >  select  * from table1  where  *field2*  in
> >  ( select *field2  *from dbo.concept_db where field1='ABC' )
> >
> > if this is not possible then i will have to search inner query  (
> > select *field2
> > *from dbo.concept_db where field1='ABC' )  first and then only  run the
> > outer query
> >
> >
> No, there are no joins in Solr. Consider de-normalizing your schema, if you
> haven't.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Nipen Mark


Re: nested solr queries

2009-11-30 Thread Mark N
field2="xyz" we dont know until we run query1

To simply i was actually trying to do some kind of JOIN similar to following
SQL query


 select  * from table1  where  *field2*  in
 ( select *field2  *from dbo.concept_db where field1='ABC' )

if this is not possible then i will have to search inner query  (
select *field2
*from dbo.concept_db where field1='ABC' )  first and then only  run the
outer query

thanks
chandan




On Mon, Nov 30, 2009 at 2:25 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Nov 30, 2009 at 2:02 PM, Mark N  wrote:
>
> > hi shalin
> >
> > I am trying to achieve something like JOIN. Previously am doing this with
> > two queries on solr
> >
> > solr index  = ( field1 ,field 2, field3)
> >
> > query1 = (  for  example field1="ABC" )
> >
> > suppose query1 returns results set1= { 1, 2 ,3 ,4 } which matches query1
> >
> > query2 = (   get all records having field2="xyz" for each records  i.e
>  for
> > set1= {1,2,3,4} returned by query1 )
> >
> >
> That sequence of queries will return documents which have field1="ABC" and
> field2="xyz". The same result can be obtained in one query with
> q=+field1:"ABC" +field2:"xyz"
>
> Have I misunderstood the problem?
>
>
> > Am not sure if I could do something like this using the nested solr query
> > from link
> >
> > http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/
> >
> >
> No, nested queries can only influence scores. They do not filter the
> results.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: nested solr queries

2009-11-30 Thread Mark N
hi shalin

I am trying to achieve something like JOIN. Previously am doing this with
two queries on solr

solr index  = ( field1 ,field 2, field3)

query1 = (  for  example field1="ABC" )

suppose query1 returns results set1= { 1, 2 ,3 ,4 } which matches query1

query2 = (   get all records having field2="xyz" for each records  i.e  for
set1= {1,2,3,4} returned by query1 )

Am not sure if I could do something like this using the nested solr query
from link

http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/



thanks


On Mon, Nov 30, 2009 at 1:50 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Nov 30, 2009 at 1:19 PM, Mark N  wrote:
>
> > Is it possible to write nested queries in Solr similar to sql like query
> > where  I can take results of the first query and use one or more of its
> > fields as an argument in the second query.
> >
> >
> That sounds like a join. If so, the answer would be no.
>
>
> >
> > For example:
> >
> > field1:XYZ AND (_query_: field3:{value of field4})
> >
> > This should search for all types of XYZ and then iterate over the result
> > set
> > and perform a query for where field3  is equal to the value of field1
> from
> > each item of the first result set.
> >
> >
> Your description is not consistent with the query you have given. If
> field:XYZ is specified, then what are "types" of XYZ? Also, if you want to
> perform a query where field3 is equal to the value of field1 then, what is
> field4 in the query you have given?
>
>
> > this is similar to SQL like query
> >
> >
> > select distinct ( fieldA ) from table where fieldA  IN
> >
>
> That sounds similar to faceting. See
> http://wiki.apache.org/solr/SimpleFacetParameters
>
> Perhaps you can give more details on what you want to achieve.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


nested solr queries

2009-11-29 Thread Mark N
Is it possible to write nested queries in Solr similar to sql like query
where  I can take results of the first query and use one or more of its
fields as an argument in the second query.


For example:

field1:XYZ AND (_query_: field3:{value of field4})

This should search for all types of XYZ and then iterate over the result set
and perform a query for where field3  is equal to the value of field1 from
each item of the first result set.

this is similar to SQL like query


select distinct ( fieldA ) from table where fieldA  IN