from:"Dennis Gearon"

Re: When Index is Updated Frequently

2011-03-04 Thread Dennis Gearon

Nearly 100ms? If any netizen ever complained about that, I'd 'round-file' the 
complaint. Internal to a single process's execution, well, mabye it's an issue. 
Not too hard to handle.

Good job to the team that made it!

 From: Michael McCandless 

To: solr-user@lucene.apache.org; bing...@asu.edu
Cc: Bing Li 
Sent: Fri, March 4, 2011 10:45:05 AM
Subject: Re: When Index is Updated Frequently

On Fri, Mar 4, 2011 at 10:09 AM, Bing Li  wrote:

> According to my experiences, when the Lucene index updated frequently, its
> performance must become low. Is it correct?

In fact Lucene can gracefully handle a high rate of updates with low
latency turnaround on the readers, using the near-real-time (NRT) API
-- IndexWriter.getReader() (or in soon-to-be 31,
IndexReader.open(IndexWriter)).

NRT is really something a hybrid of "eventual consistency" and
"immediate consistency", because it lets your app have full control
over how quickly changes must be visible by controlling when you
pull a new NRT reader.

That said, Lucene can't offer true immediate consistency at a high
update rate -- the time to open a new NRT reader is usually too costly
to do, eg, for every search.  But eg every 100 msec (say) is
reasonable (depending on many variables...).

So... for your app you should run some tests and see.  And please
report back.

(But, unfortunately, NRT hasn't been exposed in Solr yet...).

-- 
Mike

http://blog.mikemccandless.com

Re: GET or POST for large queries?

2011-02-17 Thread Dennis Gearon

Probably you could do it, and solving a problem in business supersedes 
'rightness' concerns, much to the dismay of geeks and 'those who like rightness 
and say the word "Neemph!" '. 


the not rightness about this is that:
POST, PUT, DELETE are assumed to make changes to the URL's backend.
GET is assumed NOT to make changes.

So if your POST does not make a change . . . it breaks convention. But if it 
solves the problem . . . :-)

Another way would be to GET with a 'query file' location, and then have the 
server fetch that query and execute it.

Boy!!! I'd love to see one of your queries!!! You must have a few ANDs/ORs in 
them :-)

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.





From: mrw 
To: solr-user@lucene.apache.org
Sent: Thu, February 17, 2011 11:27:06 AM
Subject: GET or POST for large queries?


We are running into some issues with large queries.  Initially, they were
ostensibly header buffer overruns, because increasing Jetty's
headerBufferSize value to 65536 resolved them. This seems like a kludge, but
it does solve the problem for 95% of our users.

However, we do have queries that are physically larger than that and for
which increasing the headerBufferSize to 65536 does not work.  This is due
to security requirements:  Security descriptors are baked into the index,
and then potentially thousands of them (depending on the user context) are
passed in with each query.  These excessive queries are only a problem with
approximately 5% of users who are highly entitled, but the number of
security descriptors in are likely to increase and we won't have a
workaround for this security policy any time soon.

After a lot of Googling, it seems to me that it's common to increase the
headerBufferSize, but I don't see any other strategies.  Is it
possible/feasible to switch to use POST for querying?

Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2521700.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: My Plan to Scale Solr

2011-02-17 Thread Dennis Gearon

What's an 'LSA'

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.





From: Stijn Vanhoorelbeke 
To: solr-user@lucene.apache.org; bing...@asu.edu
Sent: Thu, February 17, 2011 4:28:13 AM
Subject: Re: My Plan to Scale Solr

Hi,

I'm currently looking at SolrCloud. I've managed to set up a scalable
cluster with ZooKeeper.
( see the examples in http://wiki.apache.org/solr/SolrCloud for a quick
understanding )
This way, all different shards / replicas are stored in a centralised
configuration.

Moreover the ZooKeeper contains out-of-the-box loadbalancing.
So, lets say - you have 2 different shards and each is replicated 2 times.
Your zookeeper config will look like this:

\config
...
   /live_nodes (v=6 children=4)
  lP_Port:7500_solr (ephemeral v=0)
  lP_Port:7574_solr (ephemeral v=0)
  lP_Port:8900_solr (ephemeral v=0)
  lP_Port:8983_solr (ephemeral v=0)
 /collections (v=20 children=1)
  collection1 (v=0 children=1) "configName=myconf"
   shards (v=0 children=2)
shard1 (v=0 children=3)
 lP_Port:8983_solr_ (v=4)
"node_name=lP_Port:8983_solr url=http://lP_Port:8983/solr/";
 lP_Port:7574_solr_ (v=1)
"node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/";
 lP_Port:8900_solr_ (v=1)
"node_name=lP_Port:8900_solr url=http://lP_Port:8900/solr/";
shard2 (v=0 children=2)
 lP_Port:7500_solr_ (v=0)
"node_name=lP_Port:7500_solr url=http://lP_Port:7500/solr/";
 lP_Port:7574_solr_ (v=1)
"node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/";

--> This setup can be realised, by 1 ZooKeeper module - the other solr
machines need just to know the IP_Port were the zookeeper is active & that's
it.
--> So no configuration / installing is needed to realise quick a scalable /
load balanced cluster.

Disclaimer:
ZooKeeper is a relative new feature - I'm not sure if it will work out in a
real production environment, which has a tight SLA pending.
But - definitely keep your eyes on this stuff - this will mature quickly!

Stijn Vanhoorelbeke

Re: Searching for negative numbers very slow

2011-02-16 Thread Dennis Gearon

Is it my imagination or has this exact email been on the list already?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.





From: Chris Hostetter 
To: solr-user@lucene.apache.org
Cc: yo...@lucidimagination.com
Sent: Wed, February 16, 2011 6:20:28 PM
Subject: Re: Searching for negative numbers very slow


: This was my first thought but -1 is relatively common but we have other 
: numbers just as common. 

i assume that when you say that you mean "...we have other numbers 
(that are not negative) just as common, (but searching for them is much 
faster)" ?

I don't have any insight into why your negative numbers are slower, but 
FWIW...

: Interestingly enough
: 
: fq=uid:-1
: fq=foo:bar
: fq=alpha:omega
: 
: is much (4x) slower than
: 
: q="uid:-1 AND foo:bar AND alpha:omega"

...this is (in and of itself) not that suprising for any three arbitrary 
disjoint queries.  when a BoleanQuery is a full disjunction like this (all 
clause required) it can efficiently skip scoring a lot of documents by 
looping over the clauses, asking each one for the "next" doc they 
match, and then leap frogging the other clauses to that doc.  in the case 
of the three "fq" params, each query is executd in isolatin, and *all* of 
the matches of each is accounted for.

the speed of using distinct "fq" params in situations like this comes from 
the reuse after they are in the filterCache -- you can change fq=foo:bar 
to fq=foo:baz on the next query, and still reuse 2/3 of the work that was 
done on the first query. likewise if hte next query is 
fq=uid:-1&fq=foo:bar&fq=alpha:omegabeta then 2/3 of the work is already 
done again, and if a following query is 
fq=uid:-1&fq=foo:baz&fq=alpha:omegabeta then all of the work is already 
done and cached even though that particular request has never been seen by 
solr.


-Hoss

Re: Title index to wiki

2011-02-11 Thread Dennis Gearon

Please show me this link:
 http://wiki.apache.org/solr/TitleIndex
On this page:
http://wiki.apache.org/solr/
(where I said it would be a good idea)
Or this page:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
(selected at random)

It's one thing to know that the titles can be searched, it's another to know 
what the topics are that can be searched for. Sorry if this is curt, I've 
worked 
a LOONG week.

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

From: Markus Jelsma 
To: solr-user@lucene.apache.org
Cc: Dennis Gearon 
Sent: Fri, February 11, 2011 8:07:24 AM
Subject: Re: Title index to wiki

What do you mean, there are two links to the Frontpage on each page.

On Friday 11 February 2011 16:56:41 Dennis Gearon wrote:
> I think it would be an improvement to the wikis if the link to the title
> index were at the top of the index page of the wikis :-) I looked on that
> index page & did not see that link on that page. Who's got
> write access to wikis pages?
> Sent from Yahoo! Mail on Android

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Title index to wiki

2011-02-11 Thread Dennis Gearon

I think it would be an improvement to the wikis if the link to the title 
index were at the top of the index page of the wikis :-) I looked on that index 
page & did not see that link on that page.
Who's got write access to wikis pages?
Sent from Yahoo! Mail on Android

Wikipedia table of contents.

2011-02-10 Thread Dennis Gearon

Is there a detailed, perhaps alphabetical & hierarchical table of 
contents for all ether wikis on the sole site?
Sent from Yahoo! Mail on Android

Re: dynamic fields revisited

2011-02-07 Thread Dennis Gearon

I have  a long way to go to understand all those implications. Mind you, I 
never 
-was- whining :-). Just ignorantly surprised.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.





From: Markus Jelsma 
To: solr-user@lucene.apache.org
Cc: gearond 
Sent: Mon, February 7, 2011 3:28:18 PM
Subject: Re: dynamic fields revisited

It would be quite annoying if it behaves as you were hoping for. This way it 
is possible to use different field types (and analyzers) for the same field 
value. In faceting, for example, this can be important because you should use 
analyzed fields for q and fq but unanalyzed fields for facet.field.

The same goes for sorting and range queries where you can use the same field 
value to end up in different field types, one for sorting and one for a range 
query.

Without the prefix or suffix of the dynamic field, one must statically declare 
the 

fields beforehand and loose the dynamic advantage.

> Just so anyone else can know and save themselves 1/2 hour if they spend 4
> minutes searching.
> 
> When putting a dynamic field into a document into an index, the name of the
> field RETAINS the 'constant' part of the dynamic field name.
> 
> Example
> -
> If a dynamic integer field is named '*_i' in the schema.xml file,
>   __and__
> you insert a field names 'my_integer_i', which matches the globbed field
> name '*_i',
>   __then__
> the name of the field will be 'my_integer_i' in the index
> and in your GETs/(updating)POSTs to the index on that document and
>   __NOT__
> 'my_integer' like I was kind of hoping that it would be :-(
> 
> I.E., the suffix (or prefix if you set it up that way,) will NOT be
> dropped. I was hoping that everything except the globbing character, '*',
> would just be a flag to the query processor and disappear after being
> 'noticed'.
> 
> Not so :-)

Re: Optimize seaches; business is progressing with my Solr site

2011-02-06 Thread Dennis Gearon

Hmmm, my default distance for geospatial was excluding the results, I  believe. 
I have to check to see if I was actually looking at the desired  return result 
for 'ballroom' alone. Mabye I wasn't.

But I saw a lot to learn when I applied the techniques you gave me. Thank you 
:-)

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

From: Erick Erickson 
To: solr-user@lucene.apache.org
Sent: Sun, February 6, 2011 8:21:15 AM
Subject: Re: Optimize seaches; business is progressing with my Solr site

What does &debugQuery=on give you? Second, what optimizatons are you doing?
What shows up in they analysis page? does your admin page show the terms in
your copyfield you expect?

Best
Erick

On Sun, Feb 6, 2011 at 2:03 AM, Dennis Gearon  wrote:

> Thanks to LOTS of information from you guys, my site is up and working.
> It's
> only an API now, I need to work on my OWN front end, LOL!
>
> I have my second customer. My general purpose repository API is very useful
> I'm
> finding. I will soon be in the business of optimizing the search engine
> part.
>
>
> For example. I have a copy field that has the words, 'boogie woogie
> ballroom' on
> lots of records in the copy field. I cannot find those records using
> 'boogie/boogi/boog', or the woogie versions of those, but I can with
> ballroom.
> For my VERY first lesson in optimization of search, what might be causing
> that,
> and where are the places to read on the Solr site on this?
>
> All the best on a Sunday, guys and gals.
>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>

Optimize seaches; business is progressing with my Solr site

2011-02-05 Thread Dennis Gearon

Thanks to LOTS of information from you guys, my site is up and working. It's 
only an API now, I need to work on my OWN front end, LOL!

I have my second customer. My general purpose repository API is very useful I'm 
finding. I will soon be in the business of optimizing the search engine part. 


For example. I have a copy field that has the words, 'boogie woogie ballroom' 
on 
lots of records in the copy field. I cannot find those records using 
'boogie/boogi/boog', or the woogie versions of those, but I can with ballroom. 
For my VERY first lesson in optimization of search, what might be causing that, 
and where are the places to read on the Solr site on this?

All the best on a Sunday, guys and gals.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: prices

2011-02-04 Thread Dennis Gearon

That's a good idea, Yonik. So, fields that aren't stored don't get displayed, 
so 
the float field in the schema never gets seen by the user. Good, I like it.

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Yonik Seeley 
To: solr-user@lucene.apache.org
Sent: Fri, February 4, 2011 10:49:42 AM
Subject: Re: prices

On Fri, Feb 4, 2011 at 12:56 PM, Dennis Gearon  wrote:
> Using solr 1.4.
>
> I have a price in my schema. Currently it's a tfloat. Somewhere along the way
> from php, json, solr, and back, extra zeroes are getting truncated along with
> the decimal point for even dollar amounts.
>
> So I have two questions, neither of which seemed to be findable with google.
>
> A/ Any way to keep both zeroes going inito a float field? (In the analyzer, 
>with
> XML output, the values are shown with 1 zero)
> B/ Can strings be used in range queries like a float and work well for prices?

You could do a copyField into a stored string field and use the tfloat
(or tint and store cents)
for range queries, searching, etc, and the string field just for display.

-Yonik
http://lucidimagination.com

>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a 
>better
> idea to learn from others’ mistakes, so you do not have to make them yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>

prices

2011-02-04 Thread Dennis Gearon

Using solr 1.4.

I have a price in my schema. Currently it's a tfloat. Somewhere along the way 
from php, json, solr, and back, extra zeroes are getting truncated along with 
the decimal point for even dollar amounts.

So I have two questions, neither of which seemed to be findable with google.

A/ Any way to keep both zeroes going inito a float field? (In the analyzer, 
with 
XML output, the values are shown with 1 zero)
B/ Can strings be used in range queries like a float and work well for prices?


 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: chaning schema

2011-02-03 Thread Dennis Gearon

Well, the nice thing is that I have an Amazon based dev server, and it's AMI 
stored. So if I screw something up, I just throw away that server and get a 
fresh one all configured and full of dev data and BAM back to where I was.

So I'll try it again with the -rf flags. 

I did shut down the server and I am using Tomcat.

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Gora Mohanty 
To: solr-user@lucene.apache.org
Sent: Thu, February 3, 2011 6:56:29 AM
Subject: Re: chaning schema

On Thu, Feb 3, 2011 at 6:47 PM, Erick Erickson  wrote:
> Erik:
>
> Is this a Tomcat-specific issue? Because I regularly delete just the
> data/index directory on my Windows
> box running Jetty without any problems. (3_x and trunk)
>
> Mostly want to know because I just encouraged someone to just delete the
> index dir based on my
> experience...
>
> Thanks
> Erick
>
> On Tue, Feb 1, 2011 at 12:24 PM, Erik Hatcher wrote:
>
>> the trick is, you have to remove the data/ directory, not just the
>> data/index subdirectory.  and of course then restart Solr.
>>
>> or delete *:*?commit=true, depending on what's the best fit for your ops.
>>
>>Erik
>>
>> On Feb 1, 2011, at 11:41 , Dennis Gearon wrote:
>>
>> > I tried removing the index directory once, and tomcat refused to sart up
>> because
>> > it didn't have a segments file.
[...]

I have seen this error with Tomcat, but in my experience, this has been due
to doing a "rm data/index/*" rather than "rm -rf /data/index", or due to doing
this without first shutting down Tomcat.

Regards,
Gora

MANY thanks for help on path so far (first of 2 steps on 1000step path :-)

2011-02-02 Thread Dennis Gearon

Got my API to input into both the database and the Solr instance, search 
geograhically/chronologically in Solr.

Next is Update and Delete. And then .. and then ... and then ..

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Time fields

2011-02-02 Thread Dennis Gearon

For time of day fields, NOT unix timestamp/dates, what is the best way to do 
that?

I can think of seconds since beginning of day as integer
OR
string

Any other ideas? Assume that I'll be using range queries. TIA.



 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: OAI on SOLR already done?

2011-02-02 Thread Dennis Gearon

I guess I didn't understand 'meta data'. That's why I asked the question.

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Jonathan Rochkind 
To: "solr-user@lucene.apache.org" 
Sent: Wed, February 2, 2011 2:26:32 PM
Subject: Re: OAI on SOLR already done?

On 2/2/2011 5:19 PM, Dennis Gearon wrote:
> Does something like this work to extract dates, phone numbers, addresses 
across
> international formats and languages?
> 
> Or, just in the plain ol' USA?

What are you talking about?  There is nothing discussed in this thread that 
does 
any 'extracting' of dates, phone numbers or addresses at all , whether in 
international or domestic formats.

Re: OAI on SOLR already done?

2011-02-02 Thread Dennis Gearon

Does something like this work to extract dates, phone numbers, addresses across 
international formats and languages?

Or, just in the plain ol' USA?

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Demian Katz 
To: "solr-user@lucene.apache.org" 
Cc: Paul Libbrecht 
Sent: Wed, February 2, 2011 12:40:58 PM
Subject: RE: OAI on SOLR already done?

I already replied to the original poster off-list, but it seems that it may be 
worth weighing in here as well...

The next release of VuFind (http://vufind.org) is going to include OAI-PMH 
server support.  As you say, there is really no way to plug OAI-PMH directly 
into Solr...  but a tool like VuFind can provide a fairly generic, extensible, 
Solr-based platform for building an OAI-PMH server.  Obviously this is helpful 
for some use cases and not others...  but I'm happy to provide more information 
if anyone needs it.

- Demian

From: Jonathan Rochkind [rochk...@jhu.edu]
Sent: Wednesday, February 02, 2011 3:38 PM
To: solr-user@lucene.apache.org
Cc: Paul Libbrecht
Subject: Re: OAI on SOLR already done?

The trick is that you can't just have a generic black box OAI-PMH
provider on top of any Solr index. How would it know where to get the
metadata elements it needs, such as title, or last-updated date, etc.
Any given solr index might not even have this in stored fields -- and a
given app might want to look them up from somewhere other than stored
fields.

If the Solr index does have them in stored fields, and you do want to
get them from the stored fields, then it's, I think (famous last words)
relatively straightforward code to write. A mapping from solr stored
fields to metadata elements needed for OAI-PMH, and then simply
outputting the XML template with those filled in.

I am not aware of anyone that has done this in a
re-useable/configurable-for-your-solr tool. You could possibly do it
solely using the built-in Solr
JSP/XSLT/other-templating-stuff-I-am-not-familiar-with stuff, rather
than as an external Solr client app, or it could be an external Solr
client app.

This is actually a very similar problem to something someone else asked
a few days ago "Does anyone have an OpenSearch add-on for Solr?"  Very
very similar problem, just with a different XML template for output
(usually RSS or Atom) instead of OAI-PMH.

On 2/2/2011 3:14 PM, Paul Libbrecht wrote:
> Peter,
>
> I'm afraid your service is harvesting and I am trying to look at a PMH 
> provider 
>service.
>
> Your project appeared early in the goolge matches.
>
> paul
>
>
> Le 2 févr. 2011 à 20:46, Péter Király a écrit :
>
>> Hi,
>>
>> I don't know whether it fits to your need, but we are builing a tool
>> based on Drupal (eXtensible Catalog Drupal Toolkit), which can harvest
>> with OAI-PMH and index the harvested records into Solr. The records is
>> harvested, processed, and stored into MySQL, then we index them into
>> Solr. We created some ways to manipulate the original values before
>> sending to Solr. We created it in a modular way, so you can change
>> settings in an admin interface or write your own "hooks" (special
>> Drupal functions), to taylor the application to your needs. We support
>> only Dublin Core, and our own FRBR-like schema (called XC schema), but
>> you can add more schemas. Since this forum is about Solr, and not
>> applications using Solr, if you interested this tool, plase write me a
>> private message, or visit http://eXtensibleCatalog.org, or the
>> module's page at http://drupal.org/project/xc.
>>
>> Hope this helps,
>>
>> Péter
>> eXtensible Catalog
>>
>> 2011/2/2 Paul Libbrecht:
>>> Hello list,
>>>
>>> I've met a few google matches that indicate that SOLR-based servers 
>>> implement 
>>>the Open Archive Initiative's Metadata Harvesting Protocol.
>>>
>>> Is there something made to be re-usable that would be an add-on to solr?
>>>
>>> thanks in advance
>>>
>>> paul
>

Re: chaning schema

2011-02-01 Thread Dennis Gearon

Cool, thanks for the tip, Erik :-)

There's so much to learn, and I haven't even got to tuning the thing for best 
results.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Erik Hatcher 
To: solr-user@lucene.apache.org
Sent: Tue, February 1, 2011 9:24:24 AM
Subject: Re: chaning schema

the trick is, you have to remove the data/ directory, not just the data/index 
subdirectory.  and of course then restart Solr.

or delete *:*?commit=true, depending on what's the best fit for your ops.

Erik

On Feb 1, 2011, at 11:41 , Dennis Gearon wrote:

> I tried removing the index directory once, and tomcat refused to sart up 
>because 
>
> it didn't have a segments file.
> 
> 
> 
> 
> - Original Message 
> From: Erick Erickson 
> To: solr-user@lucene.apache.org
> Sent: Tue, February 1, 2011 5:04:51 AM
> Subject: Re: chaning schema
> 
> That sounds right. You can cheat and just remove /data/index
> rather than delete *:* though (you should probably do that with the Solr
> instance stopped)
> 
> Make sure to remove the directory "index" as well.
> 
> Best
> Erick
> 
> On Tue, Feb 1, 2011 at 1:27 AM, Dennis Gearon  wrote:
> 
>> Anyone got a great little script for changing a schema?
>> 
>> i.e., after changing:
>> database,
>> the view in the database for data import
>> the data-config.xml file
>> the schema.xml file
>> 
>> I BELIEVE that I have to run:
>> a delete command for the whole index *:*
>> a full import and optimize
>> 
>> This all sound right?
>> 
>> Dennis Gearon
>> 
>> 
>> Signature Warning
>> 
>> It is always a good idea to learn from your own mistakes. It is usually a
>> better
>> idea to learn from others’ mistakes, so you do not have to make them
>> yourself.
>> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>> 
>> 
>> EARTH has a Right To Life,
>> otherwise we all die.
>> 
>> 
>

Re: chaning schema

2011-02-01 Thread Dennis Gearon

I tried removing the index directory once, and tomcat refused to sart up 
because 
it didn't have a segments file.

- Original Message 
From: Erick Erickson 
To: solr-user@lucene.apache.org
Sent: Tue, February 1, 2011 5:04:51 AM
Subject: Re: chaning schema

That sounds right. You can cheat and just remove /data/index
rather than delete *:* though (you should probably do that with the Solr
instance stopped)

Make sure to remove the directory "index" as well.

Best
Erick

On Tue, Feb 1, 2011 at 1:27 AM, Dennis Gearon  wrote:

> Anyone got a great little script for changing a schema?
>
> i.e., after changing:
>  database,
>  the view in the database for data import
>  the data-config.xml file
>  the schema.xml file
>
> I BELIEVE that I have to run:
>  a delete command for the whole index *:*
>  a full import and optimize
>
> This all sound right?
>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>

chaning schema

2011-01-31 Thread Dennis Gearon

Anyone got a great little script for changing a schema?

i.e., after changing:
  database,
  the view in the database for data import
  the data-config.xml file
  the schema.xml file

I BELIEVE that I have to run:
  a delete command for the whole index *:*
  a full import and optimize

This all sound right?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

first search on index

2011-01-30 Thread Dennis Gearon

So, is it normal for the first search against a freshly made index to return 
nothing?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

filed names for solr spatial

2011-01-29 Thread Dennis Gearon

I would love it if I could use 'latitude' and 'longitude' in all places. But it 
seems that solr spatial for 1.4 plugin only works with lat/lng. Any way to 
change that?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: get SOMETHING out of an index

2011-01-29 Thread Dennis Gearon

WEll, this is the query that USED to work, before we massaged the schema, (*I* 
did).

solr/select?wt=json&indent=true&start=0&rows=20&q={!spatial lat=37.221293 
long=-121.979192 radius=1000 unit=km threadCount=3} *:*

WHOOPS!!!

Just for fun, after spending HOURS screwing around with exceptions, after 
following some bad directions on the web to just delete the index directory 
first to do a new data import, i tried the query above and now it works.

I don't know enough to know why.

To get it working, I copied an index directory from another instance with an 
incorrect schema, issued a delete all command *:*, then did the data import and 
optimize, and voila! Along the way, I had to change the owner and group of the 
replaced ../index directory and files back to tomcat6.

I THINK that I had one of the 'lng' fields in one of the three config files of 
interest as 'long'. 

I'll ask some questons about that in the next email.

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Estrada Groups 
To: "solr-user@lucene.apache.org" 
Cc: "solr-user@lucene.apache.org" 
Sent: Sat, January 29, 2011 9:35:56 PM
Subject: Re: get SOMETHING out of an index

It would be really helpful to send along your schem.XML file so we can see how 
you are indexing these points. Polygons and linestrings are not supported yet. 
Another good way to test is using the Solr/admin tool or hand jamming your 
params in manually. Type *:* as your query in the admin tool. And see what it 
returns. It should return all indexed fields and their values.

Keep in mind that your radius search as to be done on the field type 
solr.LatLong so check out the field called stores in the example config file.  
From there you cam start to build out the rest of your queries starting with 
{!type=geofilt} I have example code that I can send along tomorrow. 

For the Solr/Lucene contributors out there, was it the point of storing lats 
and 
longs in individual fields if they can't really be used for anything? I they 
can 
please gimme an example that uses solr.point type.

Adam 

Sent from my iPhone

On Jan 29, 2011, at 11:09 PM, Dennis Gearon  wrote:

> I indexed my whole database (only 52k records).
> 
> It has some geospatioal on it. I set the geospatial to 1000km radius to 
>centered 
>
> on the town where they all are, and NADA comes out.
> 
> How can I find out what's in the index and get at least ONE document out?
> 
> Dennis Gearon
> 
> 
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a 
>better 
>
> idea to learn from others’ mistakes, so you do not have to make them 
> yourself. 

> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> 
> EARTH has a Right To Life,
> otherwise we all die.
>

get SOMETHING out of an index

2011-01-29 Thread Dennis Gearon

I indexed my whole database (only 52k records).

It has some geospatioal on it. I set the geospatial to 1000km radius to 
centered 
on the town where they all are, and NADA comes out.

How can I find out what's in the index and get at least ONE document out?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: match count per shard and across shards

2011-01-29 Thread Dennis Gearon

Sounds like the interface level to achieve this is multiple indexes.

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Upayavira 
To: solr-user@lucene.apache.org
Sent: Sat, January 29, 2011 3:51:45 PM
Subject: Re: match count per shard and across shards

To my knowledge, the distributed search functionality is intended to be
transparent, thus no details deriving from it are exposed (e.g. what
docs come from which shard), so, no, I don't believe it to be possible.

The only way I know right now that you could achieve it is by two (sets
of) queries. One would be a distributed search across all shards, and
the other would be a single hit to every shard. To fake such a facet,
this second set of queries would only need to ask for totals, so it
could use a rows=0.

Otherwise you'd have to enhance the distributed indexing code to expose
some of this information in its response.

Upayavira

On Sat, 29 Jan 2011 03:48 -0800, "csj" 
wrote:
> 
> Hi,
> 
> Is it possible to construct a Solr query that will return the total
> number
> of hits there across all shards, and at the same time getting the number
> of
> hits per shard?
> 
> I was thinking along the lines of a faceted search, but I'm not deep
> enough
> into Solr capabilities and query parameters to figure it out.
> 
> Regards,
> 
> Christian Sonne Jensen
> 
> -- 
> View this message in context:
>http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across-shards-tp2369627p2369627.html
>l
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source

Thoughts on USING dynamic fields for extending objects

2011-01-28 Thread Dennis Gearon

Well, mid, next month we're going to start using dynamic fields as the relate 
to 
our business rules. Basically, in involves have a basic set of objects in 
code/database, and flattened for search in Solr. The MAIN business object is to 
be extendable by the custoemer, will still having to supply the required fields 
in the base object. We will use defined type, dynamic fields

I had a question for those more experienced than I. We are thinking about  two 
possible usage patterns:

A/ User can add any field they want, as long as they use the right suffix for 
the field.
Changing the schema can be done at will, and updating past objects is 
totally on the user.
They get:
  1/ Find within the field.
  2/ Range queries
  3/ other future, single field functionality later
  4
B/ User can NOT add any field they want, but they must submit a schema, 
hopefully automated. The data still
 goes into the Solr index as dynamically accepted fields as long as they 
use 
the right suffix for the field.
 Changing the schema can be done at by submitting the new schema. Updating 
past objects is STILL
 totally on the user.
They get:
  1/ Find within the field.
  2/ Range queries
  3/ Various filter functions like: mandatory fields, acceptable ranges, 
minimum lengths on strings, and other processing.
  4/ Other future, single field functionality later
  5/ The ability to make their own copyfields for 'grouping' of their own 
fields.


'A' I see as most simple to administer, but possible has security holes? THAT's 
my main question, all thoughts welcom.
'B' is better as a value added service, but has a LOT more work on our site's 
end, I believe. We could also possibly do non acceptance of sensitive field 
names for security?

Any thoughts much appreciateed.
 



 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: Solr for noSQL

2011-01-28 Thread Dennis Gearon

Personally, I just create a view that flattens out the database and renames the 
fields as I desire. Then I call the view with the DIH to import it.

Solr doesn't knwo anything about the databsae, except how to get a connection 
and fetch rows. And that's pretty darn useful, just that much less code to 
write.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Upayavira 
To: solr-user@lucene.apache.org
Sent: Fri, January 28, 2011 1:41:42 AM
Subject: Re: Solr for noSQL



On Thu, 27 Jan 2011 21:38 -0800, "Dennis Gearon" 
wrote:
> Why not make one's own DIH handler, Lance?

Personally, I don't like that approach. Solr is best related to as
something of a black box that you configure, then push content to.
Having Solr know about your data sources, and pull content in seems to
me to be mixing concerns.

I relate to the DIH as a useful tool for smaller sites or for
prototyping, but would expect anything more substantial to require an
indexing application that gives you full control over the indexing
process. It could be a lightweight app that uses a MongoDB java client
and SolrJ, and simply pulls from one and pushes to the other. If you
don't want to run another JVM, it could run as a separate webapp within
your Solr JVM.

From an architectural point of view, do you configure Mysql, or MongoDB
for that matter, to pull content into itself? Likewise, Solr should be a
service that listens, waiting to be given data.

Upayavira
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source

Re: Does solr supports indexing of files other than UTF-8

2011-01-28 Thread Dennis Gearon

Use ICONV library in your server side language.

Convert it to utf-8, store it with a filed describing what incoding it was in, 
and re encode it if you wish.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: prasad deshpande 
To: solr-user@lucene.apache.org
Sent: Fri, January 28, 2011 12:41:29 AM
Subject: Re: Does solr supports indexing of files other than UTF-8

Thanks paul.

However I want to support local encoding files to be indexed. How would I
achieve it?

On Thu, Jan 27, 2011 at 2:46 PM, Paul Libbrecht  wrote:

> At least in java utf-8 transcoding is done on a stream basis. No issue
> there.
>
> paul
>
>
> Le 27 janv. 2011 à 09:51, prasad deshpande a écrit :
>
> > The size of docs can be huge, like suppose there are 800MB pdf file to
> index
> > it I need to translate it in UTF-8 and then send this file to index. Now
> > suppose there can be any number of clients who can upload file. at that
> time
> > it will affect performance. and already our product support localization
> > with local encoding.
> >
> > Thanks,
> > Prasad
> >
> > On Thu, Jan 27, 2011 at 2:04 PM, Paul Libbrecht 
> wrote:
> >
> >> Why is converting documents to utf-8 not feasible?
> >> Nowadays any platform offers such services.
> >>
> >> Can you give a detailed failure description (maybe with the URL to a
> sample
> >> document you post)?
> >>
> >> paul
> >>
> >>
> >> Le 27 janv. 2011 à 07:31, prasad deshpande a écrit :
> >>> I am able to successfully index/search non-Engilsh data(like Hebrew,
> >>> Japnese) which was encoded in UTF-8.
> >>> However, When I tried to index data which was encoded in local encoding
> >> like
> >>> Big5 for Japanese I could not see the desired results.
> >>> The contents after indexing looked garbled for Big5 encoded document
> when
> >> I
> >>> searched for all indexed documents.
> >>>
> >>> Converting a complete document in UTF-8 is not feasible.
> >>> I am not very clear about how Solr support these localizations with
> other
> >>> than UTF-8 encoding.
> >>>
> >>>
> >>> I verified below links
> >>> 1. http://lucene.apache.org/java/3_0_3/api/all/index.html
> >>> 2.  http://wiki.apache.org/solr/LanguageAnalysis
> >>>
> >>> Thanks and Regards,
> >>> Prasad
> >>
> >>
>
>

Re: Solr for noSQL

2011-01-27 Thread Dennis Gearon

Why not make one's own DIH handler, Lance?

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Lance Norskog 
To: solr-user@lucene.apache.org
Sent: Thu, January 27, 2011 9:33:25 PM
Subject: Re: Solr for noSQL

There no special connectors available to read from the key-value
stores like memcache/cassandra/mongodb. You would have to get a Java
client library for the DB and code your own dataimporthandler
datasource.  I cannot recommend this; you should make your own program
to read data and upload to Solr with one of the Solr client libraries.

Lance

On 1/27/11, Jianbin Dai  wrote:
> Hi,
>
>
>
> Do we have data import handler to fast read in data from noSQL database,
> specifically, MongoDB I am thinking to use?
>
> Or a more general question, how does Solr work with noSQL database?
>
> Thanks.
>
>
>
> Jianbin
>
>
>
>

-- 
Lance Norskog
goks...@gmail.com

Re: How to group result when search on multiple fields

2011-01-26 Thread Dennis Gearon

Thsi is probably either 'shingling' or 'facets'.

Someone more experienced can verify that or add more details.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: cyang2010 
To: solr-user@lucene.apache.org
Sent: Wed, January 26, 2011 3:35:47 PM
Subject: How to group result when search on multiple fields


Let me give an example to illustrate my question:

On netflix site, the search box allow you to search by movie, tv shows,
actors, directors, and genres.  

If "Tomcat" is searched, it gives result as:  move titles with "Tomcat" or
whatever, and somewhere in between , it also show two actors, "Tom Cruise"
and "Tom Hanks".   Then followed by a lot of other movies titles.  

If this is all based on the same type of index document (titles that has
title name, associated actors, directors, and genres), then search result
are all titles.  How is it able to render matching actors as part of the
result.  In other word, how does it tell some movie are returned because of
actor match?  

If it is implemented as two different type of index document.  One document
type for title (name, actors, directors ...), the other is for actor (actor
name, movie/tv titles).   How does it merge result?  As far as i notice,
sometimes actors name can appear anywhere in search result as a group.   Is
it just comaring the score of the first actor document with that of title
match result, and then decide where to insert the actor match result?  Well,
that can be inaccurate, right?  Score from two different type of document
are not comparable right?

Let me know what your thought on this.  Thanks in advance.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-group-result-when-search-on-multiple-fields-tp2358441p2358441.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: in-index representaton of tokens

2011-01-25 Thread Dennis Gearon

I am saying there is a list of tokens that have been parsed (a table of them) 
for each column? Or one for the whole index?

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Jonathan Rochkind 
To: "solr-user@lucene.apache.org" 
Sent: Tue, January 25, 2011 9:29:36 AM
Subject: Re: in-index representaton of tokens

Why does it matter?  You can't really get at them unless you store them.

I don't know what "table per column" means, there's nothing in Solr 
architecture called a "table" or a "column". Although by column you 
probably mean more or less Solr "field".  There is nothing like a 
"table" in Solr.

Solr is still not an rdbms.

On 1/25/2011 12:26 PM, Dennis Gearon wrote:
> So, the index is a list of tokens per column, right?
>
> There's a table per column that lists the analyzed tokens?
>
> And the tokens per column are represented as what, system integers? 32/64 bit
> unsigned ints?
>
>   Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a 
>better
> idea to learn from others’ mistakes, so you do not have to make them yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>

in-index representaton of tokens

2011-01-25 Thread Dennis Gearon

So, the index is a list of tokens per column, right?

There's a table per column that lists the analyzed tokens?

And the tokens per column are represented as what, system integers? 32/64 bit 
unsigned ints?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: DIH serialize

2011-01-23 Thread Dennis Gearon

Depends on your process chain to the eventual viewer/consumer of the data.

The questions to ask are:
  A/ Is the data IN Solr going to be viewed or processed in its original form:
  -->set stored = 'true'
 --->no serialization needed.
  B/ If it's going to be anayzed and searched for separate from any other 
field, 

  the analyzing will put it into  an unreadable form. If you need to see 
it, 
then
 --->set indexed="true" and stored="true"
 --->no serializaton needed.   C/ If it's NOT going to be viewed AS IS, and 
it's not going to be searched for AS IS,
   (i.e. other columns will be how the data is found), and you have 
another, 

   serialzable format:
   -->set indexed="false" and stored="true"
   -->serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.
  C/ If it's NOT going to be viewed AS IS, and it's not going to be searched 
for 
AS IS,
   (i.e. other columns will be how the data is found), and you have 
another, 

   serialzable format:
   -->set indexed="false" and stored="true"
   -->serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.
  D/ If it's NOT going to be viewed AS IS, BUT it's going to be searched for AS 
IS,
   (this column will be how the data is found), and you have another, 
   serialzable format:
   -->you need to put it into TWO columns
   -->A SERIALIZED FIELD
   -->set indexed="false" and stored="true"

  -->>AN UNSERIALIZED FIELD
   -->set indexed="false" and stored="true"
   -->serialize AS PER THE INTENDED APPLICATION,
   not sure that Solr can do that at all.

Hope that helps!


Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Papp Richard 
To: solr-user@lucene.apache.org
Sent: Sun, January 23, 2011 2:02:05 PM
Subject: DIH serialize

Hi all,



  I wasted the last few hours trying to serialize some column values (from
mysql) into a Solr column, but I just can't find such a function. I'll use
the value in PHP - I don't know if it is possible to serialize in PHP style
at all. This is what I tried and works with a given factor:



in schema.xml:

   



in DIH xml:





  <![CDATA[

function my_serialize(row)

{

  row.put('main_timetable', row.toString());

  return row;

}

  ]]>



.



  

.

> 



  Can I use java directly in script (

one last questoni on dynamic fields

2011-01-23 Thread Dennis Gearon

Is it possible to use ONE definition of a dynamic field type for inserting 
mulitple dynamic fields of that type with different names? Or do I need a 
seperate dynamic field definition for each eventual field?

Can I do this?

  
  
  .  
  .  



and then doing for insert


  all their values
  9802490824908
  9809084
  09845970011
  09874523459870



 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: filter update by IP

2011-01-23 Thread Dennis Gearon

Most times people do this by running solr ONLY local host, and running some 
kind 
of permission scheme through a server site application.

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Erik Hatcher 
To: solr-user@lucene.apache.org
Sent: Sun, January 23, 2011 10:47:02 AM
Subject: Re: filter update by IP

No.  SolrQueryRequest doesn't (currently) have access to the actual HTTP 
request 
coming in.  You'll need to do this either with a servlet filter and register it 
into web.xml or restrict it from some other external firewall'ish technology.

Erik

On Jan 23, 2011, at 13:21 , Teebo wrote:

> Hi
> 
> I would like to restrict access to /update/csv request handler
> 
> Is there a ready to use UpdateRequestProcessor for that ?
> 
> 
> My first idea was to heritate from CSVRequestHandler and to overload
> public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp) {
>  ...
>  restrict by IP code
>  ...
>  super(req, rsp);
> }
> 
> What do you think ?
> 
> Regards,
> t.

Re: api key filtering

2011-01-22 Thread Dennis Gearon

Totally agree, do it at indexing time, in the index.

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Jonathan Rochkind 
To: "solr-user@lucene.apache.org" 
Sent: Sat, January 22, 2011 5:28:50 PM
Subject: RE: api key filtering

If you COULD solve your problem by indexing 'public', or other tokens from a 
limited vocabulary of document roles, in a field -- then I'd definitely suggest 
you look into doing that, rather than doing odd things with Solr instead. If 
the 
only barrier is not currently having sufficient logic at the indexing stage to 
do that, then it is going to end up being a lot less of a headache in the long 
term to simply add a layer at the indexing stage to add that in, then trying to 
get Solr to do things outside of it's, well, 'comfort zone'. 

Of course, depending on your requirements, it might not be possible to do that, 
maybe you can't express the semantics in terms of a limited set of roles 
applied 
to documents. And then maybe your best option really is sending an up to 2k 
element list (not exactly the same list every time, presumably) of acceptable 
documents to Solr with every query, and maybe you can get that to work 
reasonably.  Depending on how many different complete lists of documents you 
have, maybe there's a way to use Solr caches effectively in that situation, or 
maybe that's not even neccesary since lookup by unique id should be pretty 
quick 
anyway, not really sure. 

But if the semantics are possible, much better to work with Solr rather than 
against it, it's going to take a lot less tinkering to get Solr to perform well 
if you can just send an fq=role:public or something, instead of a list of 
document IDs.  You won't need to worry about it, it'll just work, because you 
know you're having Solr do what it's built to do. Totally worth a bit of work 
to 
add a logic layer at the indexing stage. IMO. 

From: Erick Erickson [erickerick...@gmail.com]
Sent: Saturday, January 22, 2011 4:50 PM
To: solr-user@lucene.apache.org
Subject: Re: api key filtering

1024 is the default number, it can be increased. See MaxBooleanClauses
in solrconfig.xml

This shouldn't be a problem with 2K clauses, but expanding it to tens of
thousands is probably a mistake (but test to be sure).

Best
Erick

On Sat, Jan 22, 2011 at 3:50 PM, Matt Mitchell  wrote:

> Hey thanks I'll definitely have a read. The only problem with this though,
> is that our api is a thin layer of app-code, with solr only (no db), we
> index data from our sql db into solr, and push the index off for
> consumption.
>
> The only other idea I had was to send a list of the allowed document ids
> along with every solr query, but then I'm sure I'd run into a filter query
> limit. Each key could be associated with up to 2k documents, so that's 2k
> values in an fq which would probably be too many for lucene (I think its
> limit 1024).
>
> Matt
>
> On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon  >wrote:
>
> > The only way that you would have that many api keys per record, is if one
> > of
> > them represented 'public', right? 'public' is a ROLE. Your answer is to
> use
> > RBAC
> > style techniques.
> >
> >
> > Here are some links that I have on the subject. What I'm thinking of
> doing
> > is:
> > Sorry for formatting, Firefox is freaking out. I cut and pasted these
> from
> > an
> > email from my sent box. I hope the links came out.
> >
> >
> > Part 1
> >
> >
> >
>http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/
>/
> >
> >
> > Part2
> > Role-based access control in SQL, part 2 at Xaprb
> >
> >
> >
> >
> >
> > ACL/RBAC Bookmarks ALL
> >
> > UserRbac - symfony - Trac
> > A Role-Based Access Control (RBAC) system for PHP
> > Appendix C: Task-Field Access
> > Role-based access control in SQL, part 2 at Xaprb
> > PHP Access Control - PHP5 CMS Framework Development | PHP Zone
> > Linux file and directory permissions
> > MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root
> > Password
> > per RECORD/Entity permissions? - symfony users | Google Groups
> > Special Topics: Authentication and Authorization | The Definitive Guide
> to
> > Yii |
> > Yii Framework
>

Re: api key filtering

2011-01-22 Thread Dennis Gearon

Got it, here are the links that I have on RBAC/ACL/Access Control. Some of 
these 
are specific to Solr.

http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/
 
http://www.xaprb.com/blog/2006/08/18/role-based-access-control-in-sql-part-2/ 


http://php.dzone.com/articles/php-access-control?page=0,1 
http://www.tonymarston.net/php-mysql/role-based-access-control.html 
http://www.tonymarston.net/php-mysql/menuguide/appendixc.html 
http://php.dzone.com/articles/php-access-control?page=0,1 
http://trac.symfony-project.org/wiki/UserRbac 
http://www.tonymarston.net/php-mysql/role-based-access-control.html 
http://www.tonymarston.net/php-mysql/menuguide/appendixc.html 
http://trac.symfony-project.org/wiki/UserRbac
http://code.google.com/p/kohana-mptt/source/browse/trunk/acl/libraries/Acl.php?r=82
 
http://www.oracle.com/technetwork/articles/javaee/ajax-135201.html 
http://phpgacl.sourceforge.net/ 
http://www.java2s.com/Code/Java/GWT/ClassthatactsasaclienttoaJSONservice.htm 
http://dev.w3.org/perl/modules/W3C/Rnodes/bin/makeAclTables.sql 
http://dev.juokaz.com/ 
http://dev.w3.org/perl/modules/W3C/Rnodes/bin/makeAclTables.sql 
http://stackoverflow.com/questions/54230/cakephp-acl-database-setup-aro-aco-structure
 
http://phpgacl.sourceforge.net/ 
http://blog.reardonsoftware.com/2010/07/spring-security-acl-schema-for-oracle.html
 
http://www.mail-archive.com/symfony-users@googlegroups.com/msg29537.html 
http://www.schemaweb.info/schema/SchemaInfo.aspx?id=167 
http://www.assembla.com/code/backendpro/subversion/nodes/trunk/modules/auth/libraries/Khacl.php?rev=169
 
http://framework.zend.com/wiki/display/ZFUSER/Using+Zend_Acl+with+a+database+backend
 
http://www.w3.org/2001/04/20-ACLs#Structure
http://lucene.472066.n3.nabble.com/Modelling-Access-Control-td1756817.html#a1759372
 
http://www.tonymarston.net/php-mysql/role-based-access-control.html 
http://phpgacl.sourceforge.net/ 
http://jmcneese.wordpress.com/2009/04/05/row-level-model-access-control-for-cakephp/#comment-112
 
http://jmcneese.wordpress.com/2009/04/05/row-level-model-access-control-for-cakephp/
 
http://www.xaprb.com/blog/2006/08/18/role-based-access-control-in-sql-part-2/ 
http://php.dzone.com/articles/php-access-control?page=0,1 
https://issues.apache.org/jira/browse/SOLR-1834 
http://www.tonymarston.net/php-mysql/role-based-access-control.html 
http://php.dzone.com/articles/php-access-control?page=0,1 
http://www.yiiframework.com/doc/guide/1.1/en/topics.auth#role-based-access-control
 
http://lucene.472066.n3.nabble.com/Modelling-Access-Control-td1756817.html#a1759372
 
http://phpgacl.sourceforge.net/ 
http://jmcneese.wordpress.com/2009/04/05/row-level-model-access-control-for-cakephp/#comment-112
 
http://jmcneese.wordpress.com/2009/04/05/row-level-model-access-control-for-cakephp/
 
http://www.yiiframework.com/doc/guide/topics.auth#role-based-access-control

 


- Original Message 
From: Dennis Gearon 
To: solr-user@lucene.apache.org
Sent: Sat, January 22, 2011 1:22:04 PM
Subject: Re: api key filtering

Dang! There were hot, clickable links in the web mail I put them in. I guess 
you 

guys can search for those strings on google and find them. Sorry.




- Original Message 
From: Dennis Gearon 
To: solr-user@lucene.apache.org
Sent: Sat, January 22, 2011 1:09:26 PM
Subject: Re: api key filtering

The links didn't work, so here the are again, NOT from a sent folder:

PHP Access Control - PHP5 CMS Framework Development | PHP Zone
A Role-Based Access Control (RBAC) system for PHP 
Appendix C: Task-Field Access 
Role-based access control in SQL, part 2 at Xaprb 
PHP Access Control - PHP5 CMS Framework Development | PHP Zone
UserRbac - symfony - Trac 
A Role-Based Access Control (RBAC) system for PHP 
Appendix C: Task-Field Access 
Role-based access control in SQL, part 2 at Xaprb 
UserRbac - symfony - Trac 
Acl.php - kohana-mptt - Project Hosting on Google Code 
CANDIDATE-PHP Generic Access Control Lists 
http://dev.w3.org/perl/modules/W3C/Rnodes/bin/makeAclTables.sql 
makeAclTables.sql 
php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow 
PHP Generic Access Control Lists 
Reardon's Ruminations: Spring Security ACL Schema for Oracle 
Re: [symfony-users] Implementing an existing ACL API in symfony 
SchemaWeb - Classes And Properties - ACL Schema 
trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla 
Using Zend_Acl with a database backend - Zend Framework Wiki 
W3C ACL System 

Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 


idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Matt Mitchell 
To: solr-user@lucene.apache.org
Sent: Sat, January 22, 2011 12:50:24 PM
Subject: Re: api key fi

Re: api key filtering

2011-01-22 Thread Dennis Gearon

Dang! There were hot, clickable links in the web mail I put them in. I guess 
you 
guys can search for those strings on google and find them. Sorry.

- Original Message 
From: Dennis Gearon 
To: solr-user@lucene.apache.org
Sent: Sat, January 22, 2011 1:09:26 PM
Subject: Re: api key filtering

The links didn't work, so here the are again, NOT from a sent folder:

PHP Access Control - PHP5 CMS Framework Development | PHP Zone
A Role-Based Access Control (RBAC) system for PHP 
Appendix C: Task-Field Access 
Role-based access control in SQL, part 2 at Xaprb 
PHP Access Control - PHP5 CMS Framework Development | PHP Zone
UserRbac - symfony - Trac 
A Role-Based Access Control (RBAC) system for PHP 
Appendix C: Task-Field Access 
Role-based access control in SQL, part 2 at Xaprb 
UserRbac - symfony - Trac 
Acl.php - kohana-mptt - Project Hosting on Google Code 
CANDIDATE-PHP Generic Access Control Lists 
http://dev.w3.org/perl/modules/W3C/Rnodes/bin/makeAclTables.sql 
makeAclTables.sql 
php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow 
PHP Generic Access Control Lists 
Reardon's Ruminations: Spring Security ACL Schema for Oracle 
Re: [symfony-users] Implementing an existing ACL API in symfony 
SchemaWeb - Classes And Properties - ACL Schema 
trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla 
Using Zend_Acl with a database backend - Zend Framework Wiki 
W3C ACL System 

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 

idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Matt Mitchell 
To: solr-user@lucene.apache.org
Sent: Sat, January 22, 2011 12:50:24 PM
Subject: Re: api key filtering

Hey thanks I'll definitely have a read. The only problem with this though,
is that our api is a thin layer of app-code, with solr only (no db), we
index data from our sql db into solr, and push the index off for
consumption.

The only other idea I had was to send a list of the allowed document ids
along with every solr query, but then I'm sure I'd run into a filter query
limit. Each key could be associated with up to 2k documents, so that's 2k
values in an fq which would probably be too many for lucene (I think its
limit 1024).

Matt

On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon wrote:

> The only way that you would have that many api keys per record, is if one
> of
> them represented 'public', right? 'public' is a ROLE. Your answer is to use
> RBAC
> style techniques.
>
>
> Here are some links that I have on the subject. What I'm thinking of doing
> is:
> Sorry for formatting, Firefox is freaking out. I cut and pasted these from
> an
> email from my sent box. I hope the links came out.
>
>
> Part 1
>
>
>http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/
>
>/
>
>
> Part2
> Role-based access control in SQL, part 2 at Xaprb
>
>
>
>
>
> ACL/RBAC Bookmarks ALL
>
> UserRbac - symfony - Trac
> A Role-Based Access Control (RBAC) system for PHP
> Appendix C: Task-Field Access
> Role-based access control in SQL, part 2 at Xaprb
> PHP Access Control - PHP5 CMS Framework Development | PHP Zone
> Linux file and directory permissions
> MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root
> Password
> per RECORD/Entity permissions? - symfony users | Google Groups
> Special Topics: Authentication and Authorization | The Definitive Guide to
> Yii |
> Yii Framework
>
> att.net Mail (gear...@sbcglobal.net)
> Solr - User - Modelling Access Control
> PHP Generic Access Control Lists
> Row-level Model Access Control for CakePHP « some flot, some jet
> Row-level Model Access Control for CakePHP « some flot, some jet
> Yahoo! GeoCities: Get a web site with easy-to-use site building tools.
> Class that acts as a client to a JSON service : JSON « GWT « Java
> Juozas Kaziukėnas devBlog
> Re: [symfony-users] Implementing an existing ACL API in symfony
> php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow
> W3C ACL System
> makeAclTables.sql
> SchemaWeb - Classes And Properties - ACL Schema
> Reardon's Ruminations: Spring Security ACL Schema for Oracle
> trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla
> Acl.php - kohana-mptt - Project Hosting on Google Code
> Asynchronous JavaScript Technology and XML (Ajax) With the Java Platform
> The page cannot be found
>
>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. I

Re: api key filtering

2011-01-22 Thread Dennis Gearon

The links didn't work, so here the are again, NOT from a sent folder:

PHP Access Control - PHP5 CMS Framework Development | PHP Zone
 A Role-Based Access Control (RBAC) system for PHP 
Appendix C: Task-Field Access 
Role-based access control in SQL, part 2 at Xaprb 
PHP Access Control - PHP5 CMS Framework Development | PHP Zone
 UserRbac - symfony - Trac 
A Role-Based Access Control (RBAC) system for PHP 
Appendix C: Task-Field Access 
Role-based access control in SQL, part 2 at Xaprb 
UserRbac - symfony - Trac 
Acl.php - kohana-mptt - Project Hosting on Google Code 
CANDIDATE-PHP Generic Access Control Lists 
http://dev.w3.org/perl/modules/W3C/Rnodes/bin/makeAclTables.sql 
makeAclTables.sql 
php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow 
PHP Generic Access Control Lists 
Reardon's Ruminations: Spring Security ACL Schema for Oracle 
Re: [symfony-users] Implementing an existing ACL API in symfony 
SchemaWeb - Classes And Properties - ACL Schema 
trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla 
Using Zend_Acl with a database backend - Zend Framework Wiki 
W3C ACL System 

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Matt Mitchell 
To: solr-user@lucene.apache.org
Sent: Sat, January 22, 2011 12:50:24 PM
Subject: Re: api key filtering

Hey thanks I'll definitely have a read. The only problem with this though,
is that our api is a thin layer of app-code, with solr only (no db), we
index data from our sql db into solr, and push the index off for
consumption.

The only other idea I had was to send a list of the allowed document ids
along with every solr query, but then I'm sure I'd run into a filter query
limit. Each key could be associated with up to 2k documents, so that's 2k
values in an fq which would probably be too many for lucene (I think its
limit 1024).

Matt

On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon wrote:

> The only way that you would have that many api keys per record, is if one
> of
> them represented 'public', right? 'public' is a ROLE. Your answer is to use
> RBAC
> style techniques.
>
>
> Here are some links that I have on the subject. What I'm thinking of doing
> is:
> Sorry for formatting, Firefox is freaking out. I cut and pasted these from
> an
> email from my sent box. I hope the links came out.
>
>
> Part 1
>
>
>http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/
>/
>
>
> Part2
> Role-based access control in SQL, part 2 at Xaprb
>
>
>
>
>
> ACL/RBAC Bookmarks ALL
>
> UserRbac - symfony - Trac
> A Role-Based Access Control (RBAC) system for PHP
> Appendix C: Task-Field Access
> Role-based access control in SQL, part 2 at Xaprb
> PHP Access Control - PHP5 CMS Framework Development | PHP Zone
> Linux file and directory permissions
> MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root
> Password
> per RECORD/Entity permissions? - symfony users | Google Groups
> Special Topics: Authentication and Authorization | The Definitive Guide to
> Yii |
> Yii Framework
>
> att.net Mail (gear...@sbcglobal.net)
> Solr - User - Modelling Access Control
> PHP Generic Access Control Lists
> Row-level Model Access Control for CakePHP « some flot, some jet
> Row-level Model Access Control for CakePHP « some flot, some jet
> Yahoo! GeoCities: Get a web site with easy-to-use site building tools.
> Class that acts as a client to a JSON service : JSON « GWT « Java
> Juozas Kaziukėnas devBlog
> Re: [symfony-users] Implementing an existing ACL API in symfony
> php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow
> W3C ACL System
> makeAclTables.sql
> SchemaWeb - Classes And Properties - ACL Schema
> Reardon's Ruminations: Spring Security ACL Schema for Oracle
> trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla
> Acl.php - kohana-mptt - Project Hosting on Google Code
> Asynchronous JavaScript Technology and XML (Ajax) With the Java Platform
> The page cannot be found
>
>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
>
> - Original Me

Re: api key filtering

2011-01-22 Thread Dennis Gearon

The only way that you would have that many api keys per record, is if one of 
them represented 'public', right? 'public' is a ROLE. Your answer is to use 
RBAC 
style techniques.


Here are some links that I have on the subject. What I'm thinking of doing is:
Sorry for formatting, Firefox is freaking out. I cut and pasted these from an 
email from my sent box. I hope the links came out.


Part 1

http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/


Part2
Role-based access control in SQL, part 2 at Xaprb 





ACL/RBAC Bookmarks ALL

UserRbac - symfony - Trac 
A Role-Based Access Control (RBAC) system for PHP 
Appendix C: Task-Field Access 
Role-based access control in SQL, part 2 at Xaprb 
PHP Access Control - PHP5 CMS Framework Development | PHP Zone 
Linux file and directory permissions 
MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root Password 
per RECORD/Entity permissions? - symfony users | Google Groups 
Special Topics: Authentication and Authorization | The Definitive Guide to Yii 
| 
Yii Framework 

att.net Mail (gear...@sbcglobal.net) 
Solr - User - Modelling Access Control 
PHP Generic Access Control Lists 
Row-level Model Access Control for CakePHP « some flot, some jet 
Row-level Model Access Control for CakePHP « some flot, some jet 
Yahoo! GeoCities: Get a web site with easy-to-use site building tools. 
Class that acts as a client to a JSON service : JSON « GWT « Java 
Juozas Kaziukėnas devBlog 
Re: [symfony-users] Implementing an existing ACL API in symfony 
php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow 
W3C ACL System 
makeAclTables.sql 
SchemaWeb - Classes And Properties - ACL Schema 
Reardon's Ruminations: Spring Security ACL Schema for Oracle 
trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla 
Acl.php - kohana-mptt - Project Hosting on Google Code 
Asynchronous JavaScript Technology and XML (Ajax) With the Java Platform 
The page cannot be found 
 

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Matt Mitchell 
To: solr-user@lucene.apache.org
Sent: Sat, January 22, 2011 11:48:22 AM
Subject: api key filtering

Just wanted to see if others are handling this in some special way, but I
think this is pretty simple.

We have a database of api keys that map to "allowed" db records. I'm
planning on indexing the db records into solr, along with their api keys in
an indexed, non-stored, multi-valued field. Then, to query for docs that
belong to a particular api key, they'll be queried using a filter query on
api_key.

The only concern of mine is that, what if we end up with 100k api_keys?
Would it be a problem to have 100k non-stored keys in each document? We have
about 500k documents total.

Matt

Re: Integrating Surround Query Parser

2011-01-20 Thread Dennis Gearon

Sounds to me like you either have to find a way NOT to use a parser that is a 
child class of:

   org.apache.solr.search.QParserPlugin

(not sure if that's possible), or you have to find out what's wrong with the 
file. Where did you get it, have you talked to the author?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Ahson Iqbal 
To: Solr Send Mail 
Sent: Thu, January 20, 2011 11:24:37 PM
Subject: Integrating Surround Query Parser

Hi All

I want to integrate Surround Query Parser with solr, To do this i have 
downloaded jar file from the internet and and then pasting that jar file in 
web-inf/lib 

and configured query parser in solrconfig.xml as 


now when i load solr admin page following exception comes
org.apache.solr.common.SolrException: Error Instantiating QParserPlugin,  
org.apache.lucene.queryParser.surround.parser.QueryParser is not a  
org.apache.solr.search.QParserPlugin

what i think that i didnt get the right plugin, can any body guide me from 
where 

to get right plugin for surround query parser or how to accurately integrate 
this plugin with solr. 


thanx
Ahsan

Re: pruning search result with search score gradient

2011-01-20 Thread Dennis Gearon

that's a pretty good idea, using 'delta score'

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Toke Eskildsen 
To: "solr-user@lucene.apache.org" 
Sent: Thu, January 20, 2011 11:31:48 PM
Subject: Re: pruning search result with search score gradient

On Tue, 2011-01-11 at 12:12 +0100, Julien Piquot wrote:
> I would like to be able to prune my search result by removing the less 
> relevant documents. I'm thinking about using the search score : I use 
> the search scores of the document set (I assume there are sorted by 
> descending order), normalise them (0 would be the the lowest value and 1 
> the greatest value) and then calculate the gradient of the normalised 
> scores. The documents with a gradient below a threshold value would be 
> rejected.

As part of experimenting with federated search, this is one approach
we'll be trying out to determine which results to discard when merging.

> If the scores are linearly decreasing, then no document is rejected. 
> However, if there is a brutal score drop, then the documents below the 
> drop are rejected.

So if we have the scores
1.0, 0.9, 0.2, 0.15, 0.1, 0.05
then the slopes will be
0.05, 0.4, 0.025, 0.025, 0.025
and with a slope threshold of 0.1, we would discard everything from
score 0.2 and below.

It makes sense if the scores are linear with the relevance (a document
with score 0.8 has double the relevance as one with 0.4). I don't know
if they are, so experiments must be made and I fear that this is another
demonstration of the inherent problem with quantifying quality.

- Toke

Re: Document level security

2011-01-20 Thread Dennis Gearon

Would you do that with 1000's of users? How expensive in processor time is it?

Have you ever benchmarked it?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Grijesh 
To: solr-user@lucene.apache.org
Sent: Thu, January 20, 2011 11:05:33 PM
Subject: Re: Document level security


Hi Rok,

I have used about 25 ids with OR Operator and its working fine for
me.Just Have to Increase the MaxBoolClouse parameter and also have to
configure max header size on Servlet container to enable for big query
requests.

-
Thanx:
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Document-level-security-tp2298066p2300117.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: Document level security

2011-01-20 Thread Dennis Gearon

I'm thinking of using something like this:

http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/

http://www.xaprb.com/blog/2006/08/18/role-based-access-control-in-sql-part-2/

- Original Message 
From: Dennis Gearon 
To: solr-user@lucene.apache.org
Sent: Thu, January 20, 2011 8:21:02 PM
Subject: Re: Document level security

I'm not sure how you COULD do searching without having the permissions in the 
documents. I mentally use the model of unix filesystems, as a starter. Simple, 
but powerful. If I needed a separate table for permissions, or index, I'd have 
to do queries, with GINORMOUS amounts of OR statements.

I see it flowing like:

User U
Has Access to Documents DS (40,000,000 out of 100,000,000 of them),
Now get these (list of 40x10^06) documents.

How do you see it Peter?

Dennis Gearon

- Original Message 
From: Peter Sturge 
To: solr-user@lucene.apache.org
Sent: Thu, January 20, 2011 3:16:59 PM
Subject: Re: Document level security

Hi,

One of the things about Document Security is that it never involves
just one thing. There are a lot of things to consider, and
unfortunately, they're generally non-trivial.

Deciding how to store/hold/retrieve permissions is certainly one of
those things, and you're right, you should avoid attaching permissions
to document data in the index, because if you want to change
permissions (and you will want to change them at some point), it can
be a cumbersome job, particularly if it involves millions of
documents, replication, shards etc. It's also generally a good idea
not to tie your schema to permission fields.

Another big consideration is authentication - how can you be sure the
request is coming from the user you think it is? Is there a
certificate involved? Has the user authenticated to the container? If
so, how do you get to this? and so on...

For permissions storage, there are two realistic approaches to consider:
   1. Write a SearchComponent that handles permission requests. This
typically involves storing/reading permissions in/from a file,
database or separate index (see SOLR-1872)
   2. Use an LCF module to retrieve permissions from the original
documents themselves (see SOLR-1834)

Hope this helps,
Peter

On Thu, Jan 20, 2011 at 8:44 PM, Rok Rejc  wrote:
> Hi all,
>
> I have an index containing a couple of million documents.
> Documents are grouped into "groups", each group contains from 1000-2
> documents.
>
> The problem:
> Each group has defined permission settings. It can be viewed by public,
> viewed by registred users, or viewed by a list of users (each group has her
> own list of users).
> Said differently: I need a document security.
>
> What I read from the other threads it is not recommended to store
> permissions in the index. I have already all the permissions in the
> database, but I don't "know" how to connect the database and the index.
> I can query the database to get the groups in which the user is and after
> that do the OR query, but I am afraid that this list can be too big (100
> OR's could also exceeds maximum HTTP GET query string length).
>
> What are the other options? Should I write a custom collector which will
> query (and cache) the database for permissions?
>
> Any ideas are appreciated...
>
> Many thanks, Rok
>

Re: Document level security

2011-01-20 Thread Dennis Gearon

I'm not sure how you COULD do searching without having the permissions in the 
documents. I mentally use the model of unix filesystems, as a starter. Simple, 
but powerful. If I needed a separate table for permissions, or index, I'd have 
to do queries, with GINORMOUS amounts of OR statements.

I see it flowing like:

User U
Has Access to Documents DS (40,000,000 out of 100,000,000 of them),
Now get these (list of 40x10^06) documents.

How do you see it Peter?

 Dennis Gearon

- Original Message 
From: Peter Sturge 
To: solr-user@lucene.apache.org
Sent: Thu, January 20, 2011 3:16:59 PM
Subject: Re: Document level security

Hi,

One of the things about Document Security is that it never involves
just one thing. There are a lot of things to consider, and
unfortunately, they're generally non-trivial.

Deciding how to store/hold/retrieve permissions is certainly one of
those things, and you're right, you should avoid attaching permissions
to document data in the index, because if you want to change
permissions (and you will want to change them at some point), it can
be a cumbersome job, particularly if it involves millions of
documents, replication, shards etc. It's also generally a good idea
not to tie your schema to permission fields.

Another big consideration is authentication - how can you be sure the
request is coming from the user you think it is? Is there a
certificate involved? Has the user authenticated to the container? If
so, how do you get to this? and so on...

For permissions storage, there are two realistic approaches to consider:
   1. Write a SearchComponent that handles permission requests. This
typically involves storing/reading permissions in/from a file,
database or separate index (see SOLR-1872)
   2. Use an LCF module to retrieve permissions from the original
documents themselves (see SOLR-1834)

Hope this helps,
Peter

On Thu, Jan 20, 2011 at 8:44 PM, Rok Rejc  wrote:
> Hi all,
>
> I have an index containing a couple of million documents.
> Documents are grouped into "groups", each group contains from 1000-2
> documents.
>
> The problem:
> Each group has defined permission settings. It can be viewed by public,
> viewed by registred users, or viewed by a list of users (each group has her
> own list of users).
> Said differently: I need a document security.
>
> What I read from the other threads it is not recommended to store
> permissions in the index. I have already all the permissions in the
> database, but I don't "know" how to connect the database and the index.
> I can query the database to get the groups in which the user is and after
> that do the OR query, but I am afraid that this list can be too big (100
> OR's could also exceeds maximum HTTP GET query string length).
>
> What are the other options? Should I write a custom collector which will
> query (and cache) the database for permissions?
>
> Any ideas are appreciated...
>
> Many thanks, Rok
>

Re: unix permission styles for access control

2011-01-19 Thread Dennis Gearon

Three-dimensional multi value sounds good.  Tough choice on character 
vs full-length words. Full length os easier & less confusing, but with 
hopefully millions pd documents in the future, it increasas index size.
Sent from Yahoo! Mail on Android

Documentaion: For newbies and recent newbies

2011-01-19 Thread Dennis Gearon

If someone is looking for good documentation and getting started guides, I am 
putting this in the newsgroups to be searched upon. I recommend:

A/ The Wikis: (FREE)
   http://wiki.apache.org/solr/FrontPage

B/ The book and eBook: (COSTS $45.89)
  https://www.packtpub.com/solr-1-4-enterprise-search-server/book

C/ The (seemingly) total reference guide:(FREE, with registration)
   
http://www.lucidimagination.com/software_downloads/certified/cdrg/lucidworks-solr-refguide-1.4.pdf



D/ The webinar on optimizing the search engine to Do a GOOD search, 
 based on YOUR needs, not general ones: (FREE, with registration)
  
http://www.lucidimagination.com/Solutions/Webinars/Analyze-This-Tips-and-tricks-getting-LuceneSolr-Analyzer-index-and-search-your-content


Personally, I am working on being more than barely informed on items A & B :-)

Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: unix permission styles for access control

2011-01-19 Thread Dennis Gearon

So, if I used something like r-u-d-o in a field (read,update,delete,others) I 
could get it tokenized to those four characters,and then search for those in 
that field. Is that what you're suggesting, (thanks by the way).

An article I read created a 'hybrid' access control system (can't remember if 
it 
was ACL or RBAC). It used a primary system like Unix file system 9bit 
permission 
for the primary permissions normally needed on most objects of any kind, and 
then flagged if there were any other permissions and any other groups. It was 
very fast for the primary permissons, and fast for the secondary. 

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Jonathan Rochkind 
To: "solr-user@lucene.apache.org" 
Sent: Wed, January 19, 2011 8:40:30 AM
Subject: Re: unix permission styles for access control

No. There is no built in way to address 'bits' in Solr that I am aware 
of.  Instead you can think about how to transform your data at indexing 
into individual tokens (rather than bits) in one or more field, such 
that they are capable of answering your query.  Solr works in tokens as 
the basic unit of operation (mostly, basically), not characters or bytes 
or bits.

On 1/19/2011 9:48 AM, Dennis Gearon wrote:
> Sorry for repeat, trying to make sure this gets on the newsgroup to 'all'.
>
> So 'fieldName.x' is how to address bits?
>
>
>   Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a 
>better
> idea to learn from others’ mistakes, so you do not have to make them yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
>
> - Original Message 
> From: Toke Eskildsen
> To: "solr-user@lucene.apache.org"
> Sent: Wed, January 19, 2011 12:23:04 AM
> Subject: Re: unix permission styles for access control
>
> On Wed, 2011-01-19 at 08:15 +0100, Dennis Gearon wrote:
>> I was wondering if the are binary operation filters? Haven't seen any in the
>> book nor was able to find any using google.
>>
>> So if I had 0600(octal) in a permission field, and I wanted to return any
>> records that 'permission&  0400(octal)==TRUE', how would I filter that?
> Don't you mean permission&  0400(octal) == 0400? Anyway, the
> functionality can be accomplished by extending your index a bit.
>
>
> You could split the permission into user, group and all parts, then use
> an expanded query.
>
> If the permission is 0755 it will be indexed as
> user_p:7 group_p:5 all_p:5
>
> If you're searching for something with at least 0650 your query should
> be expanded to
> (user_p:7 OR user_p:6) AND (group_p:7 OR group_p:5)
>
>
> Alternatively you could represent the bits explicitly in the index:
> user_p:1 user_p:2 user_p:4 group_p:1 group_p:4 all_p:1 all_p:5
>
> Then a search for 0650 would query with
> user_p:2 AND user_p:4 AND group_p:1 AND group_p:4
>
>
> Finally you could represent all valid permission values, still split
> into parts with
> user_p:1 user_p:2 user_p:3 user_p:4 user_p:5 user_p:6 user_p:7
> group_p:1 group_p:2 group_p:3 group_p:4 group_p:5
> all_p:1 all_p:2 all_p:3 all_p:4 all_p:5
>
> The query would be simply
> user_p:6 AND group_p:5

Re: unix permission styles for access control

2011-01-19 Thread Dennis Gearon

Did some more searching this morning. Perhaps being bleary eyed helpe :-) I 
found this JIRA which does bitwise boolean operator filtering:

 https://issues.apache.org/jira/browse/SOLR-1913

I'm not that sure how to interpret JIRA pages for features. It's 'OPEN", but 
the 
comments all say it works.

So, what's they syntax for combining filters in queries? I am currently using 
the spatial filter.How would I write a query that combines:

http://localhost:8983/path/to/solr/select/?q={!bitwise  field=fieldname 
op=OPERATION_NAME source=sourcevalue  negate=boolean}remainder
   {!spatial lat=37.393026 long=-121.998304 radius=10 unit=km threadCount=3} 
ts_begin:[1 TO 2145916800] AND text:"find_this"
 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Toke Eskildsen 
To: "solr-user@lucene.apache.org" 
Sent: Wed, January 19, 2011 12:23:04 AM
Subject: Re: unix permission styles for access control

On Wed, 2011-01-19 at 08:15 +0100, Dennis Gearon wrote:
> I was wondering if the are binary operation filters? Haven't seen any in the 
> book nor was able to find any using google.
> 
> So if I had 0600(octal) in a permission field, and I wanted to return any 
> records that 'permission & 0400(octal)==TRUE', how would I filter that?

Don't you mean permission & 0400(octal) == 0400? Anyway, the
functionality can be accomplished by extending your index a bit.

You could split the permission into user, group and all parts, then use
an expanded query.

If the permission is 0755 it will be indexed as
user_p:7 group_p:5 all_p:5

If you're searching for something with at least 0650 your query should
be expanded to 
(user_p:7 OR user_p:6) AND (group_p:7 OR group_p:5)

Alternatively you could represent the bits explicitly in the index:
user_p:1 user_p:2 user_p:4 group_p:1 group_p:4 all_p:1 all_p:5

Then a search for 0650 would query with
user_p:2 AND user_p:4 AND group_p:1 AND group_p:4

Finally you could represent all valid permission values, still split
into parts with
user_p:1 user_p:2 user_p:3 user_p:4 user_p:5 user_p:6 user_p:7
group_p:1 group_p:2 group_p:3 group_p:4 group_p:5
all_p:1 all_p:2 all_p:3 all_p:4 all_p:5

The query would be simply
user_p:6 AND group_p:5

Re: unix permission styles for access control

2011-01-19 Thread Dennis Gearon

Sorry for repeat, trying to make sure this gets on the newsgroup to 'all'.

So 'fieldName.x' is how to address bits?

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Toke Eskildsen 
To: "solr-user@lucene.apache.org" 
Sent: Wed, January 19, 2011 12:23:04 AM
Subject: Re: unix permission styles for access control

On Wed, 2011-01-19 at 08:15 +0100, Dennis Gearon wrote:
> I was wondering if the are binary operation filters? Haven't seen any in the 
> book nor was able to find any using google.
> 
> So if I had 0600(octal) in a permission field, and I wanted to return any 
> records that 'permission & 0400(octal)==TRUE', how would I filter that?

Don't you mean permission & 0400(octal) == 0400? Anyway, the
functionality can be accomplished by extending your index a bit.

You could split the permission into user, group and all parts, then use
an expanded query.

If the permission is 0755 it will be indexed as
user_p:7 group_p:5 all_p:5

If you're searching for something with at least 0650 your query should
be expanded to 
(user_p:7 OR user_p:6) AND (group_p:7 OR group_p:5)

Alternatively you could represent the bits explicitly in the index:
user_p:1 user_p:2 user_p:4 group_p:1 group_p:4 all_p:1 all_p:5

Then a search for 0650 would query with
user_p:2 AND user_p:4 AND group_p:1 AND group_p:4

Finally you could represent all valid permission values, still split
into parts with
user_p:1 user_p:2 user_p:3 user_p:4 user_p:5 user_p:6 user_p:7
group_p:1 group_p:2 group_p:3 group_p:4 group_p:5
all_p:1 all_p:2 all_p:3 all_p:4 all_p:5

The query would be simply
user_p:6 AND group_p:5

Re: unix permission styles for access control

2011-01-19 Thread Dennis Gearon

so fieldName.x ishow to address bits?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Toke Eskildsen 
To: "solr-user@lucene.apache.org" 
Sent: Wed, January 19, 2011 12:23:04 AM
Subject: Re: unix permission styles for access control

On Wed, 2011-01-19 at 08:15 +0100, Dennis Gearon wrote:
> I was wondering if the are binary operation filters? Haven't seen any in the 
> book nor was able to find any using google.
> 
> So if I had 0600(octal) in a permission field, and I wanted to return any 
> records that 'permission & 0400(octal)==TRUE', how would I filter that?

Don't you mean permission & 0400(octal) == 0400? Anyway, the
functionality can be accomplished by extending your index a bit.


You could split the permission into user, group and all parts, then use
an expanded query.

If the permission is 0755 it will be indexed as
user_p:7 group_p:5 all_p:5

If you're searching for something with at least 0650 your query should
be expanded to 
(user_p:7 OR user_p:6) AND (group_p:7 OR group_p:5)


Alternatively you could represent the bits explicitly in the index:
user_p:1 user_p:2 user_p:4 group_p:1 group_p:4 all_p:1 all_p:5

Then a search for 0650 would query with
user_p:2 AND user_p:4 AND group_p:1 AND group_p:4


Finally you could represent all valid permission values, still split
into parts with
user_p:1 user_p:2 user_p:3 user_p:4 user_p:5 user_p:6 user_p:7
group_p:1 group_p:2 group_p:3 group_p:4 group_p:5
all_p:1 all_p:2 all_p:3 all_p:4 all_p:5

The query would be simply
user_p:6 AND group_p:5

unix permission styles for access control

2011-01-18 Thread Dennis Gearon

I was wondering if the are binary operation filters? Haven't seen any in the 
book nor was able to find any using google.

So if I had 0600(octal) in a permission field, and I wanted to return any 
records that 'permission & 0400(octal)==TRUE', how would I filter that?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: Indexing and Searching Chinese with SolrNet

2011-01-18 Thread Dennis Gearon

Make sure your browser is set to UTF-8 encoding.

- Original Message 
From: Otis Gospodnetic 
To: solr-user@lucene.apache.org; bing...@asu.edu
Sent: Tue, January 18, 2011 10:39:16 AM
Subject: Re: Indexing and Searching Chinese with SolrNet

Bing Li,

Go to your Solr Admin page and use the Analysis functionality there to enter 
some Chinese text and see how it's getting analyzed at index and at search 
time.  This will tell you what is (or isn't) going on.
Here it looks like you just defined index-time analysis, so you should see your 
index-time analysis look very different from your query-time analysis.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

- Original Message 
> From: Bing Li 
> To: solr-user@lucene.apache.org
> Sent: Tue, January 18, 2011 1:30:37 PM
> Subject: Indexing and Searching Chinese with SolrNet
> 
> Dear all,
> 
> After reading some pages on the Web, I created the index with  the following
> schema.
> 
> ..
>   positionIncrementGap="100">
>  
>class="solr.ChineseTokenizerFactory"/>
>   
>  
> ..
> 
> It must be correct, right? However, when  sending a query though SolrNet, no
> results are returned. Could you tell me  what the reason is?
> 
> Thanks,
> LB
>

Re: Solr UUID field for externally generated UUIDs

2011-01-18 Thread Dennis Gearon

THX, Chris!

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Chris Hostetter 
To: solr-user@lucene.apache.org
Sent: Tue, January 18, 2011 11:35:38 AM
Subject: Re: Solr UUID field for externally generated UUIDs


:  
: 
: The above won't generate a UUID on it's own, right?

correct.


-Hoss

Solr UUID field for externally generated UUIDs

2011-01-18 Thread Dennis Gearon

I would like to use the following field declaration to store my own, COMB 
UUIDs, 
(same length and format, a kind of cross between version 1 and version 4). If I 
leave out default value in the declaration, would that work? I.E.:


 

The above won't generate a UUID on it's own, right?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: Does Solr supports indexing & search for Hebrew.

2011-01-18 Thread Dennis Gearon

Whoops, picked the wrong email to reply thanks to. Wasn't actually in this 
thread.

 Dennis Gearon
- Original Message 

From: Dennis Gearon 
To: solr-user@lucene.apache.org
Sent: Tue, January 18, 2011 8:25:04 AM
Subject: Re: Does Solr supports indexing & search for Hebrew.

Thanks Ofer :-)

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 

idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Ofer Fort 
To: solr-user@lucene.apache.org
Sent: Tue, January 18, 2011 4:55:53 AM
Subject: Re: Does Solr supports indexing & search for Hebrew.

take a look at :
http://github.com/synhershko/HebMorph with more info at
http://www.code972.com/blog/hebmorph/

On Tue, Jan 18, 2011 at 11:04 AM, prasad deshpande <
prasad.deshpand...@gmail.com> wrote:

> Hello,
>
> With reference to below links I haven't found Hebrew support in Solr.
>
> http://wiki.apache.org/solr/LanguageAnalysis
>
> http://lucene.apache.org/java/3_0_3/api/all/index.html
>
> If I want to index and search Hebrew files/data then how would I achieve
> this?
>
> Thanks,
> Prasad
>

Re: Does Solr supports indexing & search for Hebrew.

2011-01-18 Thread Dennis Gearon

Thanks Ofer :-)


 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Ofer Fort 
To: solr-user@lucene.apache.org
Sent: Tue, January 18, 2011 4:55:53 AM
Subject: Re: Does Solr supports indexing & search for Hebrew.

take a look at :
http://github.com/synhershko/HebMorph with more info at
http://www.code972.com/blog/hebmorph/


On Tue, Jan 18, 2011 at 11:04 AM, prasad deshpande <
prasad.deshpand...@gmail.com> wrote:

> Hello,
>
> With reference to below links I haven't found Hebrew support in Solr.
>
> http://wiki.apache.org/solr/LanguageAnalysis
>
> http://lucene.apache.org/java/3_0_3/api/all/index.html
>
> If I want to index and search Hebrew files/data then how would I achieve
> this?
>
> Thanks,
> Prasad
>

Re: just got 'the book' already have a question

2011-01-18 Thread Dennis Gearon

Thanks Robert.

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Robert Muir 
To: solr-user@lucene.apache.org
Sent: Tue, January 18, 2011 1:40:25 AM
Subject: Re: just got 'the book' already have a question

On Mon, Jan 17, 2011 at 11:10 PM, Dennis Gearon  wrote:
> First of all, seems like a good book,
>
> Solr-14-Enterprise-Search-Server.pdf
>
> Question, is it possible to choose locale at search time? So if my customer is
> querying across cultural/national/linguistic boundaries and I have the data 
for
> him different languages in the same index, can I sort based on his language?
>

http://wiki.apache.org/solr/UnicodeCollation#Sorting_text_for_multiple_languages

Re: NRT

2011-01-18 Thread Dennis Gearon

Thanks Otis

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Otis Gospodnetic 
To: solr-user@lucene.apache.org
Sent: Mon, January 17, 2011 11:15:23 PM
Subject: Re: NRT

Hi,

> How is NRT doing, being used in production? 

> Which Solr is it in? 

Unless I missed it, I don't think there is true NRT in Solr just yet.

> And is there built in Spatial in that version?
> 
> How is Solr 4.x  doing?

Well :)

3 ways to know this sort of stuff:
* follow the dev list - high volume
* subscribe to Sematext Blog - we publish monthly Solr Digests
* check JIRA to see how many issues remain to be fixed

Otis
--
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

explicit field type descriptions

2011-01-17 Thread Dennis Gearon

Is there any tabular data anywhere on ALL field types and ALL options?

For example, I've looked everywhere in the last hour, and I don't see anywhere 
on Solr site, google, or in the 1.4 manual where it says whether a copyField 
'directive' can be made ' required="true" '.



 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

just got 'the book' already have a question

2011-01-17 Thread Dennis Gearon

First of all, seems like a good book,

Solr-14-Enterprise-Search-Server.pdf

Question, is it possible to choose locale at search time? So if my customer is 
querying across cultural/national/linguistic boundaries and I have the data for 
him different languages in the same index, can I sort based on his language?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

NRT

2011-01-17 Thread Dennis Gearon

How is NRT doing, being used in production? 

Which Solr is it in? 

And is there built in Spatial in that version?

How is Solr 4.x doing?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: use of schema.xml

2011-01-13 Thread Dennis Gearon

I could put 1-10,000 fileds in any one document, as long as they are told what 
type or they are dynamically matched by dynamic fields relative to what's in 
the 
schema.xml file? 

It's very much like google 'big tables' or 'elastic search' that way, right?

It's up to me to enforce any field names or quantities and assign field types 
during insert/update?

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Lance Norskog 
To: solr-user@lucene.apache.org
Sent: Thu, January 13, 2011 8:16:54 PM
Subject: Re: use of schema.xml

Wait- it does enforce the schema names. What it does not enforce is
field contents when you change the schema. Since Lucene does not have
field replacement, it is not practical to remove or add a field to all
existing documents when you change the schema.

On Thu, Jan 13, 2011 at 8:15 PM, Lance Norskog  wrote:
> Correct. Solr and Lucene do not store or enforce the schema. You're on
> your own :)
>
> On Thu, Jan 13, 2011 at 8:09 PM, Dennis Gearon  wrote:
>> I'm going to buy the book for Solr, since it looks like I need to do more of 
>>the
>> work than I thought I would.
>>
>> But, from looking at it, the schema file only says:
>>
>> A/ What types of data can be in the 'fields' of the documents
>> B/ If there are any dynamically assigned fields.
>> C/ What parsers are available
>> D/ other stuff.
>>
>> And what it DOESN'T do is set the 'schema' for the index, right?
>> (like DDL for a database does)
>>
>>  Dennis Gearon
>>
>>
>> Signature Warning
>> 
>> It is always a good idea to learn from your own mistakes. It is usually a 
>>better
>> idea to learn from others’ mistakes, so you do not have to make them 
yourself.
>> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>
>>
>> EARTH has a Right To Life,
>> otherwise we all die.
>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>

-- 
Lance Norskog
goks...@gmail.com

use of schema.xml

2011-01-13 Thread Dennis Gearon

I'm going to buy the book for Solr, since it looks like I need to do more of 
the 
work than I thought I would.

But, from looking at it, the schema file only says:

A/ What types of data can be in the 'fields' of the documents
B/ If there are any dynamically assigned fields.
C/ What parsers are available
D/ other stuff.

And what it DOESN'T do is set the 'schema' for the index, right?
(like DDL for a database does)

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: start value in queries zero or one based?

2011-01-13 Thread Dennis Gearon

I'm migrating to CTO/CEO status in life due to building a small company. I find 
I don't have too much time for theory. I work with wht is.

So, what is it, not what should it be.

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Walter Underwood 
To: solr-user@lucene.apache.org
Sent: Thu, January 13, 2011 1:38:26 PM
Subject: Re: start value in queries zero or one based?

On Jan 13, 2011, at 1:28 PM, Dennis Gearon wrote:

> Do I even need a body for this message? ;-)
> 
> Dennis Gearon

Are you asking "is it" or "should it be"? If the latter, we can also discuss 
Emacs and vi.

wunder
--
Walter Underwood
K6WRU

start value in queries zero or one based?

2011-01-13 Thread Dennis Gearon

Do I even need a body for this message? ;-)

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

basic document crud in an index

2011-01-12 Thread Dennis Gearon

OK, getting ready to be more intereactive with my index, (she likes me).
These are pretty much boolean answered questions to help my understanding.
I think having these in the mail list records might help other too.

A/ Is there a query that updates all the fields automatically on a record that 
has a unique id?
   
B/ Does it leave the old document and new document in the index?

C/ Will a query immedialty following see both documents?

D/ Merging does not get rid of any old documents if there are any, but optimize 
does?

E/ Is optimize invoked on the whole index, not individual segments?


 

Thanks for a great product, ya'll. 

I have a 64K document index, small by many standards. But I did a search on it 
for a test, and started at row 16,000 of the results (broad results), and 
almost 
not noticeably slower than starting at 0. And it's on the lowest cost Amazon 
server that will run it. Of course, no one but me is hitting that box yet :-)

Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: Exciting Solr Use Cases

2011-01-12 Thread Dennis Gearon

When I have it running with a permission system (through both API and front 
end), I will share i with everyone. It's beginning tohappen.

The search if fairly primative for now. But we hope to learn or hire skills ot 
better match it to the business model as we grow/get funding.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Peter Karich 
To: solr-user@lucene.apache.org
Sent: Wed, January 12, 2011 3:37:12 PM
Subject: Exciting Solr Use Cases

Hi all!

Would you mind to write about your Solr project if it has an uncommon
approach or if it is somehow exciting?
I would like to extend my list for a new blog post.

Examples I have in mind at the moment are:
loggly (real time + big index),
solandra (nice solr + cassandra combination),
haiti trust (extrem index size),
...

Kind Regards,
Peter.

Re: PHP app not communicating with Solr

2011-01-12 Thread Dennis Gearon

I was unable to get it to compile. From the author, got one reply about the 
benefits of the compiled  version. After submitting my errors to him, have not 
yet received a reply.

##Weird thing 'on the way to the forum' today.##

I remember reading an article a couple of days ago which said the compiled 
version is 10-15% faster than the 'pure PHP' Solr library out there, (and it 
has 
a lot more capability,that's for sure!)

Turns out, this slower pure PHP version uses 'file_get_contents()'(FCG) to do 
the actual query of the Solr Instance. 


http://stackoverflow.com/questions/23/file-get-contents-vs-curl-what-has-better-performance

The article above shows that FCG is on average 22% slower than using cURL in 
basic usage. so modifying the 'pure PHP' library with cURL would make up for 
all 
of the speed that the compiled SolrPHP has.

 Dennis Gearon





- Original Message 
From: Lukas Kahwe Smith 
To: solr-user@lucene.apache.org
Sent: Wed, January 12, 2011 2:52:46 PM
Subject: Re: PHP app not communicating with Solr


On 12.01.2011, at 23:50, Eric wrote:

> Web page returns the following message:
> Fatal error: Uncaught exception 'Exception' with message '"0" Status: 
>Communication Error'
> 
> This happens in a dev environment, everything on one machine: Windows 7, 
> WAMP, 
>CakePHP, Tomcat, Solr, and SolrPHPClient. Error message also references line 
>334 
>of the Service.php file, which is part of the SolrPHPClient.
> 
> Everything works perfectly on a different machine so this problem is probably 
>related to configuration. On the problem machine, I can reach solr at 
>http://localhost:8080/solr/admin and it looks correct (AFAIK). I am 
>documenting 
>the setup procedures this time around but don't know what's different between 
>the two machines.
> 
> Google search on the error message shows the message is not uncommon so the 
>answer might be helpful to others as well.


I ran into this issue compiling PHP with--curl-wrappers.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org

Re: Solr trunk for production

2011-01-12 Thread Dennis Gearon

What's the syntax for spatial for that version of Solr?

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Ron Mayer 
To: solr-user@lucene.apache.org
Sent: Wed, January 12, 2011 7:18:10 AM
Subject: Re: Solr trunk for production

Otis Gospodnetic wrote:
> Are people using Solr trunk in serious production environments?  I suspect 
> the 

> answer is yes, just want to see if there are any gotchas/warnings.

Yes, since it seemed the best way to get edismax with this patch[1]; and to get
the more update-friendly MergePolicy[2].

Main gotcha I noticed so far is trying to figure out appropriate times
to sync with trunk's newer patches; and whether or not we need to rebuild
our kinda big (> 1TB) indexes when we do.

[1] the patch I needed: https://issues.apache.org/jira/browse/SOLR-2058
[2] nicer MergePolicy https://issues.apache.org/jira/browse/LUCENE-2602

Re: issue with the spatial search with solr

2011-01-11 Thread Dennis Gearon

You didn't happen to notice that you have one field names RestaurantLocation 
and 
another named RestaurantName, did you?

You must be submitting 'RestaurantName', and it's being applied to a geo field.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: ur lops 
To: solr-user@lucene.apache.org
Sent: Tue, January 11, 2011 11:13:36 PM
Subject: issue with the spatial search with solr

Hi,
 I took the latest build from the hudson and installed on my computer. I
have done the following changes in my schema.xml

 
 
 

When i run the query like this:
HTTP ERROR 500

Problem accessing /solr/select. Reason:

The field restaurantName does not support spatial filtering

org.apache.solr.common.SolrException: The field restaurantName does
not support spatial filtering
at 
org.apache.solr.search.SpatialFilterQParser.parse(SpatialFilterQParser.java:86)
at org.apache.solr.search.QParser.getQuery(QParser.java:143)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:112)

at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:210)

at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1296)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)

at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)

at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)



This is my solr query:

select?wt=json&indent=true&fl=name,store&q=*:*&fq={!geofilt%20sfield=restaurantName}&pt=45.15,-93.85&d=5



Any help will be highly appreciated.

Thanks

Re: Input raw log file

2011-01-11 Thread Dennis Gearon

A possible shortcut?

Write a regex that will parse out the fields as you want them, put that into 
some shell script that calls solr?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Grijesh.singh 
To: solr-user@lucene.apache.org
Sent: Tue, January 11, 2011 10:46:20 PM
Subject: Re: Input raw log file


First thing is that your raw log files solr can not understand. Solr needs
data according to schema  defined And also solr does not know your log file
format .

So you have to write a parser program that will parse your log files into a
existing solr writable formats .Then you can be able to index that data.



-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239548.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple Solr instances common core possible ?

2011-01-11 Thread Dennis Gearon

NOT sure about any of it, but THINK that READ ONLY, with one solr instance 
doing 
writes is possible. I've heard that it's NEVER possible to do multiple Solr 
Instances writing.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Ravi Kiran 
To: solr-user@lucene.apache.org
Sent: Tue, January 11, 2011 9:15:06 AM
Subject: Multiple Solr instances common core possible ?

Hello,
Is it possible to deploy multiple solr instances with different
context roots pointing to the same solr core ? If I do this will there be
any deadlocks or file handle issues ? The reason I need this setup is
because I want to expose solr to an third party vendor via a different
context root. My solr instance is deployed on Glassfish. Alternately, if
there is a configurable way to setup multiple context roots for the same
solr instance that will suffice at this point of time.

Ravi Kiran

How to insert this using Solr PHP?

2011-01-10 Thread Dennis Gearon

I am switching between building the query to a Solr instance by hand and doing 
it with PHP Solr Extension.

I have this query that my dev partner said to insert before all the other 
column 
searches. What kind of query is it and how do I get it into the query in an 
'OOP' style using the PHP Solr extension? In particular, I'm interested in what 
is the part in the query 'q={!.}. Is that a filter query? How do I put it 
into the query . . . I already asked that ;-)

URL_BASE?wt=json&indent=true&start=0&rows=20&q={!spatial lat=xx.x 
long=xxx.x radius=10 unit=km threadCount=3} OTHER COLUMNS, blah blah

 
bcc: my partner

Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

icq or other 'instant gratification' communication forums for Solr

2011-01-10 Thread Dennis Gearon

Are there any chatrooms or ICQ rooms to ask questions late at night to people 
who stay up or are on other side of planet?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: Improving Solr performance

2011-01-10 Thread Dennis Gearon

What I seem to see suggested here is to use different cores for the things you 
suggested:
  different types of documents
  Access Control Lists

I wonder how sharding would work in that scenario?

Me, I plan on :
  For security:
Using a permissions field
  For different schmas:
Dynamic fields with enough premade fields to handle it.

The one thing I don't thing my approach does well with is statistics.

 Dennis Gearon

- Original Message 
From: Jonathan Rochkind 
To: "solr-user@lucene.apache.org" 
Cc: supersoft 
Sent: Mon, January 10, 2011 1:08:00 PM
Subject: Re: Improving Solr performance

I see a lot of people using shards to hold "different types of documents", and 
it almost always seems to be a bad solution. Shards are intended for 
distributing a large index over multiple hosts -- that's it.  Not for some kind 
of federated search over multiple schemas, not for access control.

Why not put everything in the same index, without shards, and just use an 'fq' 
limit in order to limit to the specific document you'd like to search over in a 
given search?I think that would achieve your goal a lot more simply than 
shards -- then you use sharding only if and when your index grows to be so 
large 
you'd like to distribute it over multiple hosts, and when you do so you choose 
a 
shard key that will have more or less equal distribution accross shards.

Using shards for access control or schema management just leads to headaches.

[Apparently Solr could use some highlighted documentation on what shards are 
really for, as it seems to be a very common issue on this list, someone trying 
to use them for something else and then inevitably finding problems with that 
approach.]

Jonathan

On 1/7/2011 6:48 AM, supersoft wrote:
> The reason of this distribution is the kind of the documents. In spite of
> having the same schema structure (and solr conf), a document belongs to 1 of
> 5 different kinds.
> 
> Each kind corresponds to a concrete shard and due to this, the implemented
> client tool avoids searching in all the shards when the users selects just
> one or a few of kinds. The tool runs a multisharded query of the proper
> shards. I guess this is a right approach but correct me if I am wrong.
> 
> The real problem of this architecture is the correlation between concurrent
> users and response time:
> 1 query: n seconds
> 2 queries: 2*n second each query
> 3 queries: 3*n seconds each query
> and so...
> 
> This is being a real headache because 1 single query has an acceptable
> response time but when many users are accessing to the server the
> performance goes hardly down.

Re: Box occasionally pegs one cpu at 100%

2011-01-10 Thread Dennis Gearon

One other possiblity is that the OS or BIOS is doing that, at least on a 
laptop. 
There is a new feature where, if the load is low enough, non multi threaded 
applications can be assigned to one processor and that processor has it's clock 
boosted so the older software will run faster on the new processors - Otherwise 
they run SLOWER!.

My brother has a cad program that runs slower on his new quad core because the 
base clock speed is slower than a single processor CPU. The software company is 
not taking the time to rewrite their code, excpet where they add features or 
fixes. 




- Original Message 

From: Brian Burke 
To: "solr-user@lucene.apache.org" 
Sent: Mon, January 10, 2011 10:56:27 AM
Subject: Re: Box occasionally pegs one cpu at 100%

This sounds like it could be garbage collection related, especially with a heap 
that large.  Depending on your jvm tuning, a FGC could take quite a while, 
effectively 'pausing' the JVM.

Have you looked at something like jstat -gcutil   or similar to monitor the 
garbage collection?


On Jan 10, 2011, at 1:36 PM, Simon Wistow wrote:

> I have a fairly classic master/slave set up.
> 
> Response times on the slave are generally good with blips periodically, 
> apparently when replication is happening.
> 
> Occasionally however the process will have one incredibly slow query and 
> will peg the CPU at 100%.
> 
> The weird thing is that it will remain that way even if we stop querying 
> it and stop replication and then wait for over 20 minutes. The only way 
> to fix the problem at that point is to restart tomcat.
> 
> Looking at slow queries around the time of the incident they don't look 
> particularly bad - they're predominantly filter queries running under 
> dismax and there doesn't seem to be anything unusual about them.
> 
> The index file is about 266G and has 30G of disk free. The machine has 
> 50G of RAM and is running with -Xmx35G.
> 
> Looking at the processes running it appears to be the main Java thread 
> that's CPU bound, not the child threads. 
> 
> Stracing the process gives a lot of brk instructions (presumably some 
> sort of wait loop) with occasional blips of: 
> 
> 
> mprotect(0x7fc5721d9000, 4096, PROT_READ) = 0
> futex(0x451c24a4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x451c24a0, 
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x4269dd14, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x4269dd10, 
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x7fbc941603b4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 
> 325, {1294683789, 614186000}, ) = 0
> futex(0x41d19b28, FUTEX_WAKE_PRIVATE, 1) = 0
> mprotect(0x7fc5721d8000, 4096, PROT_READ) = 0
> mprotect(0x7fc5721d8000, 4096, PROT_READ|PROT_WRITE) = 0
> futex(0x7fbc94eeb5b4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7fbc94eeb5b0, 
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x426a6a28, FUTEX_WAKE_PRIVATE, 1) = 1
> mprotect(0x7fc5721d9000, 4096, PROT_NONE) = 0
> futex(0x41cae8f4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x41cae8f0, 
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x41cae328, FUTEX_WAKE_PRIVATE, 1) = 1
> futex(0x7fbc941603b4, FUTEX_WAIT_PRIVATE, 327, NULL) = 0
> futex(0x41d19b28, FUTEX_WAKE_PRIVATE, 1) = 0
> mmap(0x7fc2e023, 121962496, PROT_NONE, 
> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 
> 0x7fc2e023
> mmap(0x7fbca58e, 237568, PROT_NONE, 
> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 
> 0x7fbca58e
> 
> Any ideas about what's happening and if there's anyway to mitigate it? 
> If the box at least recovered then I could run another slave and load 
> balance between them working on the principle that the second box 
> would pick up the slack whilst the first box restabilised but, as it is, 
> that's not reliable.
> 
> Thanks,
> 
> Simon
>

Re: How to let crawlers in, but prevent their damage?

2011-01-10 Thread Dennis Gearon

H, so if someone says they have SEO skills on their resume, they COULD be 
talking about optimizing the SEARH engnie at some site, not just a web site to 
be crawled by search engines?

- Original Message 
From: Ken Krugler 
To: solr-user@lucene.apache.org
Sent: Mon, January 10, 2011 9:07:43 AM
Subject: Re: How to let crawlers in, but prevent their damage?

On Jan 10, 2011, at 7:02am, Otis Gospodnetic wrote:

> Hi Ken, thanks Ken. :)
> 
> The problem with this approach is that it exposes very limited content to
> bots/web search engines.
> 
> Take http://search-lucene.com/ for example.  People enter all kinds of queries
> in web search engines and end up on that site.  People who visit the site
> directly don't necessarily search for those same things.  Plus, new terms are
> entered to get to search-lucene.com every day, so keeping up with that would
> mean constantly generating more and more of those static pages.  Basically, 
the
> tail is super long.

To clarify - the issue of using actual user search traffic is one of SEO, not 
what content you expose.

If, for example, people commonly do a search for "java " then that's 
a hint that the URL to the static content, and the page title, should have the 
language as part of it.

So you shouldn't be generating static pages based on search traffic. Though you 
might want to decide what content to "favor" (see below) based on popularity.

> On top of that, new content is constantly being generated,
> so one would have to also constantly both add and update those static pages.

Yes, but that's why you need to automate that content generation, and do it on 
a 
regular (e.g. weekly) basis.

The big challenges we ran into were:

1. Dealing with badly behaved bots that would hammer the site.

We wound up putting this content on a separate system, so it wouldn't impact 
users on the main system.

And generating a regular report by user agent & IP address, so that we could 
block by robots.txt and IP when necessary.

2. Figuring out how to structure the static content so that it didn't look like 
spam to Google/Yahoo/Bing

You don't want to have too many links per page, or too much depth, but that 
constrains how many pages you can reasonably expose.

We had project scores based on code, activity, usage - so we used that to rank 
the content and focus on exposing early (low depth) the "good stuff". You could 
do the same based on popularity, from search logs.

Anyway, there's a lot to this topic, but it doesn't feel very Solr specific. So 
apologies for reducing the signal-to-noise ratio with talk about SEO :)

-- Ken

> I have a feeling there is not a good solution for this because on one hand
> people don't like the negative bot side effect, on the other hand people want 
>as
> much of their sites indexed by the big guys.  The only half-solution that 
comes
> to mind involves looking at who's actually crawling you and who's bringing you
> visitors, then blocking those with a bad ratio of those two - bots that crawl 
a
> lot but don't bring a lot of value.
> 
> Any other ideas?
> 
> Thanks,
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> - Original Message 
>> From: Ken Krugler 
>> To: solr-user@lucene.apache.org
>> Sent: Mon, January 10, 2011 9:43:49 AM
>> Subject: Re: How to let crawlers in, but prevent their damage?
>> 
>> Hi Otis,
>> 
>> From what I learned at Krugle, the approach that worked for us  was:
>> 
>> 1. Block all bots on the search page.
>> 
>> 2. Expose the target  content via statically linked pages that are separately
>> generated from the same  backing store, and optimized for target search terms
>> (extracted from your own  search logs).
>> 
>> -- Ken
>> 
>> On Jan 10, 2011, at 5:41am, Otis Gospodnetic  wrote:
>> 
>>> Hi,
>>> 
>>> How do people with public search  services deal with bots/crawlers?
>>> And I don't mean to ask how one bans  them (robots.txt) or slow them down
>> (Delay
>>> stuff in robots.txt) or  prevent them from digging too deep in search
>> results...
>>> 
>>> What I  mean is that when you have publicly exposed search that bots crawl,
>> they
>>> issue all kinds of crazy "queries" that result in errors, that add noise to
>> Solr
>>> caches, increase Solr cache evictions, etc. etc.
>>> 
>>> Are there some known recipes for dealing with them, minimizing their
>> negative
>>> side-effects, while still letting them crawl you?
>>> 
>>> Thanks,
>>> Otis
>>> 
>>> Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
>>> Lucene ecosystem search :: http://search-lucene.com/
>>> 
>> 
>> --
>> Ken  Krugler
>> +1 530-210-6378
>> http://bixolabs.com
>> e l a s t i c   w e b   m i n i n  g
>> 
>> 
>> 
>> 
>> 
>> 

--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g

Re: PHP PECL solr API library

2011-01-10 Thread Dennis Gearon

Yeah, it doesn't look like an easy, CRUD based interface.

- Original Message 
From: Lukas Kahwe Smith 
To: solr-user@lucene.apache.org
Sent: Sun, January 9, 2011 11:33:16 PM
Subject: Re: PHP PECL solr API library

On 10.01.2011, at 08:16, Dennis Gearon wrote:

> Anyone have any experience using this library?
> 
> http://us3.php.net/solr
> 

Yeah. it works quite well.
However imho the API is a maze. Also its lacking critical stuff like escaping 
and nice to have stuff like lucene query parsing/rewriting.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org

Re: How to let crawlers in, but prevent their damage?

2011-01-10 Thread Dennis Gearon

- Original Message 

From: lee carroll 
To: solr-user@lucene.apache.org
Sent: Mon, January 10, 2011 6:48:12 AM
Subject: Re: How to let crawlers in, but prevent their damage?

Sorry not an answer but a +1 vote for finding out best practice for this.

Related to it is DOS attacks. We have rewrite rules  in between the proxy
server and solr which attempts to filter out undesriable stuff but would it
be better to have a query app doing this?

any standard rewrite rules which drop invalid or potentially malicious
queries would be very nice :-

What exactly are milicious queries? (besides scraping) What's the problem with 
invalid queries? Unless someone is doing a custom crawl/scraping of your site, 
how are they going to issue queries that aren't alread on the site as URLs?

On 10 January 2011 13:41, Otis Gospodnetic wrote:

> Hi,
>
> How do people with public search services deal with bots/crawlers?
> And I don't mean to ask how one bans them (robots.txt) or slow them down
> (Delay
> stuff in robots.txt) or prevent them from digging too deep in search
> results...
>
> What I mean is that when you have publicly exposed search that bots crawl,
> they
> issue all kinds of crazy "queries" that result in errors, that add noise to
> Solr
> caches, increase Solr cache evictions, etc. etc.
>
> Are there some known recipes for dealing with them, minimizing their
> negative
> side-effects, while still letting them crawl you?
>
> Thanks,
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>

Re: How to let crawlers in, but prevent their damage?

2011-01-10 Thread Dennis Gearon

I don't nkow about stopping proble3ms with the issues that you've raised.

But I do know that web sites that aren't indempotent with GET requests are in a 
hurt locket. That seems to be WAY too many of them.
This means, don't do anything with GET that changes the contents of your web 
site.

Regarding a more dierct answer to your question, you'd probably have to have 
some sort of filtering applied. And anyway, crawlers only issue 'queries' based 
on the URLs found in the site, right? So are you going to have wierd URLs 
embedded in your site?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Otis Gospodnetic 
To: solr-user@lucene.apache.org
Sent: Mon, January 10, 2011 5:41:17 AM
Subject: How to let crawlers in, but prevent their damage?

Hi,

How do people with public search services deal with bots/crawlers?
And I don't mean to ask how one bans them (robots.txt) or slow them down (Delay 
stuff in robots.txt) or prevent them from digging too deep in search results...

What I mean is that when you have publicly exposed search that bots crawl, they 
issue all kinds of crazy "queries" that result in errors, that add noise to 
Solr 

caches, increase Solr cache evictions, etc. etc.

Are there some known recipes for dealing with them, minimizing their negative 
side-effects, while still letting them crawl you?

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

Re: Improving Solr performance

2011-01-09 Thread Dennis Gearon

These are definitely server grade machines.

There aren't any desktops I know of (that aren't made for HD video 
editing/rendition) that ever need that kind of memory.

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Shawn Heisey 
To: solr-user@lucene.apache.org
Sent: Sun, January 9, 2011 4:34:08 PM
Subject: Re: Improving Solr performance

On 1/7/2011 2:57 AM, supersoft wrote:
> have deployed a 5-sharded infrastructure where: shard1 has 3124422 docs
> shard2 has 920414 docs shard3 has 602772 docs shard4 has 2083492 docs shard5
> has 11915639 docs Indexes total size: 100GB
> 
> The OS is Linux x86_64 (Fedora release 8) with vMem equal to 7872420 and I
> run the server using Jetty (from Solr example download) with: java -Xmx3024M
> -Dsolr.solr.home=multicore -jar start.jar
> 
> The response time for a query is around 2-3 seconds. Nevertheless, if I
> execute several queries at the same time the performance goes down
> inmediately: 1 simultaneous query: 2516ms 2 simultaneous queries: 4250,4469
> ms 3 simultaneous queries: 5781, 6219, 6219 ms 4 simultaneous queries: 6484,
> 7203, 7719, 7781 ms...

I see from your other messages that these indexes all live on the same machine. 

You're almost certainly I/O bound, because you don't have enough memory for the 
OS to cache your index files.  With 100GB of total index size, you'll get best 
results with between 64GB and 128GB of total RAM.  Alternatively, you could use 
SSD to store the indexes instead of spinning hard drives, or put each shard on 
its own physical machine with RAM appropriately sized for the index.  For 
shard5 
on its own machine, at 64GB index size, you might be able to get away with 
32GB, 
but ideally you'd want 48-64GB.

Can you do anything to reduce the index size?  Perhaps you are storing fields 
that you don't need to be returned in the search results.  Ideally, you should 
only include enough information to fully populate a search results grid, and 
retrieve detail information for an individual document from the original data 
source instead of Solr.

Thanks,
Shawn

Re: (FQ) Filter Query Caching Differences with OR and AND?

2011-01-05 Thread Dennis Gearon

And the sky is blue and the night is black 

 



- Original Message 
From: Jonathan Rochkind 
To: "solr-user@lucene.apache.org" 
Sent: Wed, January 5, 2011 2:18:20 PM
Subject: Re: (FQ) Filter Query Caching Differences with OR and AND?

Um, good or bad for what?  It depends. But it's how Solr works either way.

On 1/5/2011 5:10 PM, Dennis Gearon wrote:
> Is that good or bad?
>
>   Dennis Gearon
>
>
>
>
> - Original Message 
> From: Jonathan Rochkind
> To: "solr-user@lucene.apache.org"
> Cc: Em
> Sent: Wed, January 5, 2011 1:53:23 PM
> Subject: Re: (FQ) Filter Query Caching Differences with OR and AND?
>
> Each 'fq' clause is it's own cache key.
>
> 1. fq=foo:bar OR foo:baz
>  =>  one entry in filter cache
>
> 2. fq=foo:bar&fq=foo:baz
> =>  two entries in filter cache, will not use cached entry from #1
>
> 3. fq=foo:bar
> =>  One entry, will use cached entry from #2
>
> 4. fq=foo:bar
>=>  One entry, will use cached entry from #2.
>
> So if you do queries in succession using each of those four fq's in order, you
> will wind up with 3 entries in the cache.
>
> Note that "fq=foo:bar OR foo:baz" is not semantically identical to
> "fq=foo&fq=bar".  Rather that latter is semantically identical to "fq=foo:bar
> AND foo:baz".   But "fq=foo&fq=bar" will be two cache entries, and "fq=foo:bar
> AND foo:baz" will be one cache entry, and the two won't share any cache 
>entries.
>
>
> On 1/5/2011 3:17 PM, Em wrote:
>> Hi,
>>
>> while reading through some information on the list and in the wiki, i found
>> out that something is missing:
>>
>> When I specify a filter queries like this
>>
>> fq=foo:bar OR foo:baz
>> or
>> fq=foo:bar&fq=foo:baz
>> or
>> fq=foo:bar
>> or
>> fq=foo:baz
>>
>> How many filter query entries will be cached?
>> Two, since there are two filters (foo:bar, foo:baz) or 3, since there are
>> three different combinations (foo:bar OR foo:baz, foo:bar, foo:baz)?
>>
>> Thank you!
>

Re: (FQ) Filter Query Caching Differences with OR and AND?

2011-01-05 Thread Dennis Gearon

Is that good or bad?

 Dennis Gearon

- Original Message 
From: Jonathan Rochkind 
To: "solr-user@lucene.apache.org" 
Cc: Em 
Sent: Wed, January 5, 2011 1:53:23 PM
Subject: Re: (FQ) Filter Query Caching Differences with OR and AND?

Each 'fq' clause is it's own cache key.

1. fq=foo:bar OR foo:baz
=> one entry in filter cache

2. fq=foo:bar&fq=foo:baz
   => two entries in filter cache, will not use cached entry from #1

3. fq=foo:bar
=> One entry, will use cached entry from #2

4. fq=foo:bar
  => One entry, will use cached entry from #2.

So if you do queries in succession using each of those four fq's in order, you 
will wind up with 3 entries in the cache.

Note that "fq=foo:bar OR foo:baz" is not semantically identical to 
"fq=foo&fq=bar".  Rather that latter is semantically identical to "fq=foo:bar 
AND foo:baz".   But "fq=foo&fq=bar" will be two cache entries, and "fq=foo:bar 
AND foo:baz" will be one cache entry, and the two won't share any cache entries.

On 1/5/2011 3:17 PM, Em wrote:
> Hi,
> 
> while reading through some information on the list and in the wiki, i found
> out that something is missing:
> 
> When I specify a filter queries like this
> 
> fq=foo:bar OR foo:baz
> or
> fq=foo:bar&fq=foo:baz
> or
> fq=foo:bar
> or
> fq=foo:baz
> 
> How many filter query entries will be cached?
> Two, since there are two filters (foo:bar, foo:baz) or 3, since there are
> three different combinations (foo:bar OR foo:baz, foo:bar, foo:baz)?
> 
> Thank you!

Re: uuid, COMB uuid, distributed farms

2011-01-04 Thread Dennis Gearon

Right, Lance, I meant in the field defintion.

I appreciate your help and direction.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Lance Norskog 
To: solr-user@lucene.apache.org
Sent: Tue, January 4, 2011 7:15:07 PM
Subject: Re: uuid, COMB uuid, distributed farms

'NOT NULL' in the schema is 'required=true' in a  element.
'Search for NOT NULL' is a little odd: you search for a range and then
negate the search, meaning for documents with nothing in that field.
This standard query does it:
-field:[* TO *]

On Tue, Jan 4, 2011 at 2:49 PM, Dennis Gearon  wrote:
> Thanks Lance.
>
> I will be generating the COMB style of UUID external to Solr.
> Prevents a lot of index paging during INSERTS on DBs, maby eSolr too.
>
> So I would not use 'NEW' in the following, right?
> Just leave default out?
> Some sort of NOT NULL available in a Solr Schema?
>
> 
> PHP code to make the COMB style of UUID,
> easily adapted to other languages, some solutions already exist:
>
> 
> //requires php5_uuid module in PHP
> function make_comb_uuid(){
>  uuid_create(&$v4);
>  uuid_make($v4, UUID_MAKE_V4);
>  uuid_export($v4, UUID_FMT_STR, &$v4String);
>  $var=gettimeofday();
>  return
> substr($v4String,0,24).substr(dechex($var['sec'].$var['usec']),0,12);
>
> }
>
> 
>
> Dennis Gearon
>
>
>
>
> - Original Message 
> From: Lance Norskog 
> To: solr-user@lucene.apache.org
> Sent: Tue, January 4, 2011 2:15:32 PM
> Subject: Re: uuid, COMB uuid, distributed farms
>
> http://wiki.apache.org/solr/UniqueKey
>
> On Mon, Jan 3, 2011 at 6:55 PM, pankaj bhatt  wrote:
>> HI Dennis,
>>  I have used UUID in context of an application where an installation id
>> (UUID) is generated by the code. It caters to around 10K users.
>>  I have not used it in context of SOLR.
>>
>> / Pankaj Bhatt.
>>
>> On Mon, Jan 3, 2011 at 11:05 PM, Dennis Gearon wrote:
>>
>>> Thank you Pankaj.
>>>
>>> How large was your installation of Solr? I'm hoping to get mine to be
>>> multinational and making plans for that as I go. So having unique ids,
>>> UUIDs,
>>> that cover a huge addressable space is a requirement.
>>>
>>> If your's was comparable, how were your replication issues, merging issues,
>>> anthing else related to getting large datasets searchable and unique?
>>>
>>>  Dennis Gearon
>>>
>>>
>>> Signature Warning
>>> 
>>> It is always a good idea to learn from your own mistakes. It is usually a
>>> better
>>> idea to learn from others’ mistakes, so you do not have to make them
>>> yourself.
>>> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>>
>>>
>>> EARTH has a Right To Life,
>>> otherwise we all die.
>>>
>>>
>>>
>>> - Original Message 
>>> From: pankaj bhatt 
>>> To: solr-user@lucene.apache.org; gear...@sbcglobal.ne
>>> Sent: Mon, January 3, 2011 8:55:21 AM
>>> Subject: Re: uuid, COMB uuid, distributed farms
>>>
>>> Hi Dennis,
>>>
>>>I have used UUID's in my project to identify a basic installation of
>>> the client.
>>>Can i be of any help.
>>>
>>> / Pankaj Bhatt.
>>>
>>> On Mon, Jan 3, 2011 at 3:28 AM, Dennis Gearon 
>>> wrote:
>>>
>>> > Planning ahead here.
>>> >
>>> > Anyone have experience with UUIDs, COMB UUIDs (sequential) in large,
>>> > internatiionally distributed Solr/Database project.
>>> >
>>> >  Dennis Gearon
>>> >
>>> >
>>> > Signature Warning
>>> > 
>>> > It is always a good idea to learn from your own mistakes. It is usually a
>>> > better
>>> > idea to learn from others’ mistakes, so you do not have to make them
>>> > yourself.
>>> > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>> >
>>> >
>>> > EARTH has a Right To Life,
>>> > otherwise we all die.
>>> >
>>> >
>>>
>>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: uuid, COMB uuid, distributed farms

2011-01-04 Thread Dennis Gearon

Thanks Lance.

I will be generating the COMB style of UUID external to Solr.
Prevents a lot of index paging during INSERTS on DBs, maby eSolr too.

So I would not use 'NEW' in the following, right? 
Just leave default out? 
Some sort of NOT NULL available in a Solr Schema?


PHP code to make the COMB style of UUID,
easily adapted to other languages, some solutions already exist:


//requires php5_uuid module in PHP
function make_comb_uuid(){
  uuid_create(&$v4);
  uuid_make($v4, UUID_MAKE_V4);
  uuid_export($v4, UUID_FMT_STR, &$v4String);
  $var=gettimeofday();
  return 
substr($v4String,0,24).substr(dechex($var['sec'].$var['usec']),0,12); 

}

 

Dennis Gearon




- Original Message 
From: Lance Norskog 
To: solr-user@lucene.apache.org
Sent: Tue, January 4, 2011 2:15:32 PM
Subject: Re: uuid, COMB uuid, distributed farms

http://wiki.apache.org/solr/UniqueKey

On Mon, Jan 3, 2011 at 6:55 PM, pankaj bhatt  wrote:
> HI Dennis,
>  I have used UUID in context of an application where an installation id
> (UUID) is generated by the code. It caters to around 10K users.
>  I have not used it in context of SOLR.
>
> / Pankaj Bhatt.
>
> On Mon, Jan 3, 2011 at 11:05 PM, Dennis Gearon wrote:
>
>> Thank you Pankaj.
>>
>> How large was your installation of Solr? I'm hoping to get mine to be
>> multinational and making plans for that as I go. So having unique ids,
>> UUIDs,
>> that cover a huge addressable space is a requirement.
>>
>> If your's was comparable, how were your replication issues, merging issues,
>> anthing else related to getting large datasets searchable and unique?
>>
>>  Dennis Gearon
>>
>>
>> Signature Warning
>> 
>> It is always a good idea to learn from your own mistakes. It is usually a
>> better
>> idea to learn from others’ mistakes, so you do not have to make them
>> yourself.
>> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>
>>
>> EARTH has a Right To Life,
>> otherwise we all die.
>>
>>
>>
>> - Original Message 
>> From: pankaj bhatt 
>> To: solr-user@lucene.apache.org; gear...@sbcglobal.ne
>> Sent: Mon, January 3, 2011 8:55:21 AM
>> Subject: Re: uuid, COMB uuid, distributed farms
>>
>> Hi Dennis,
>>
>>I have used UUID's in my project to identify a basic installation of
>> the client.
>>Can i be of any help.
>>
>> / Pankaj Bhatt.
>>
>> On Mon, Jan 3, 2011 at 3:28 AM, Dennis Gearon 
>> wrote:
>>
>> > Planning ahead here.
>> >
>> > Anyone have experience with UUIDs, COMB UUIDs (sequential) in large,
>> > internatiionally distributed Solr/Database project.
>> >
>> >  Dennis Gearon
>> >
>> >
>> > Signature Warning
>> > 
>> > It is always a good idea to learn from your own mistakes. It is usually a
>> > better
>> > idea to learn from others’ mistakes, so you do not have to make them
>> > yourself.
>> > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>> >
>> >
>> > EARTH has a Right To Life,
>> > otherwise we all die.
>> >
>> >
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Sub query using SOLR?

2011-01-04 Thread Dennis Gearon

Essentially, a subuery is an AND expression where you ask the database to find 
the identifier or set of identifiers to then use in the query outside the 
subquery.

The data that you put into a Solr index is flattened, denormalized.

So take the suquery field values and put them in an AND part of the query to 
Solr.

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Estrada Groups 
To: "solr-user@lucene.apache.org" 
Sent: Tue, January 4, 2011 10:33:29 AM
Subject: Re: Sub query using SOLR?

I am +1 on the interest on how to do this!

Adam

On Jan 4, 2011, at 1:26 PM, bbarani  wrote:

> 
> Hi,
> 
> I am trying to use subquery in SOLR, is there a way to implement this using
> SOLR query syntax?
> 
> Something like
> 
> Related_id: IN query(field=ud, q=”type:IT AND manager_12:dave”)
> 
> The thing I really want is to use output of one query to be the input of
> another query. 
> 
> Not sure if it is possible to use the query() function (function query) for
> my case..
> 
> Just want to know if ther is a better approach...
> 
> Thanks,
> Barani
> -- 
> View this message in context: 
>http://lucene.472066.n3.nabble.com/Sub-query-using-SOLR-tp2193251p2193251.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: uuid, COMB uuid, distributed farms

2011-01-03 Thread Dennis Gearon

Thank you Pankaj. 

How large was your installation of Solr? I'm hoping to get mine to be 
multinational and making plans for that as I go. So having unique ids, UUIDs, 
that cover a huge addressable space is a requirement.

If your's was comparable, how were your replication issues, merging issues, 
anthing else related to getting large datasets searchable and unique?

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: pankaj bhatt 
To: solr-user@lucene.apache.org; gear...@sbcglobal.ne
Sent: Mon, January 3, 2011 8:55:21 AM
Subject: Re: uuid, COMB uuid, distributed farms

Hi Dennis,

I have used UUID's in my project to identify a basic installation of
the client.
Can i be of any help.

/ Pankaj Bhatt.

On Mon, Jan 3, 2011 at 3:28 AM, Dennis Gearon  wrote:

> Planning ahead here.
>
> Anyone have experience with UUIDs, COMB UUIDs (sequential) in large,
> internatiionally distributed Solr/Database project.
>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>

uuid, COMB uuid, distributed farms

2011-01-02 Thread Dennis Gearon

Planning ahead here.

Anyone have experience with UUIDs, COMB UUIDs (sequential) in large, 
internatiionally distributed Solr/Database project.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: dynamic fields revisited

2010-12-30 Thread Dennis Gearon

When my Solr guru gets back, we'll redo the schema and see what happens, thanks!

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Lance Norskog 
To: solr-user@lucene.apache.org
Sent: Thu, December 30, 2010 4:26:58 PM
Subject: Re: dynamic fields revisited

solr/admin/analysis.jsp uses the Luke handler. You can browse facets and fields.

On Wed, Dec 29, 2010 at 7:46 PM, Ahmet Arslan  wrote:
>> If I understand you correctly, for an INT dynamic field
>> called *_int2
>> filled with field callled my_number_int2 during data
>> import
>> in a query, I will search in the index on the field
>> called:
>>   "my_number_int2"
>>
>> correct?
>>
>
> Exactly.
>
> Using http://wiki.apache.org/solr/LukeRequestHandler you can retrieve real 
>field names under *_int2, if thats help.
>
>
>
>

-- 
Lance Norskog
goks...@gmail.com

Re: dynamic fields revisited

2010-12-29 Thread Dennis Gearon


- Original Message 
From: Lance Norskog 
To: solr-user@lucene.apache.org
Sent: Wed, December 29, 2010 6:11:32 PM
Subject: Re: dynamic fields revisited

>>>  B/ Is the search done on the dynamic filed name in the schema, or on the 
>name
>> that was matched?
>The dynamic wildcard field name convention is only implemented by the
>code that checks the schema.
>It is not in the query syntax. Only the real field names are in the
>query syntax or returned facets.

If I understand you correctly, for an INT dynamic field called *_int2
filled with field callled my_number_int2 during data import
in a query, I will search in the index on the field called:
  "my_number_int2"

correct?

dynamic fields revisited

2010-12-28 Thread Dennis Gearon

Well, getting close to the time when the 'rubber meets the road'.

A couple of questions about dynamic fields.

  A/ How much room in the index do 'non used' dynamic fields add per record, 
any?
  B/ Is the search done on the dynamic filed name in the schema, or on the name 
that was matched?
  C/ Anyone done something like:

//schema file// (representative, not actual)
*_int1
*_int2
*_int3
*_int4

*_datetime1
*_datetime2
  .
  .

Then have fields in the imported data (especially using a DIH importing from a 
VIEW) that have custom names like:
//import source//(representative, not actual)
custom_labelA_int1
custom_labelB_int2

custom_labelC_datetime1
custom_labelD_datetime2

Is this how dynamic fields are used? I was thinking of having approximately 
1-20 
dynamic fields per datatype of interest.

  D/ If I wanted all text based dynamic fields added to some common field in 
the 
index (sorry, bad terminology), how is that done?


   

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Re: Item catagorization problem.

2010-12-23 Thread Dennis Gearon

Doesn't indexing analyzing do this to some degree anyway?

Not sure the alogrithm, but something like: How often, hom much near the top, 
how many differnt forms, subject or object of a sentence. That has to have some 
relevance to what category something is in.

The simplest extension to that would be something like a 'sub vocabulary' cross 
listing. If such and such words were hi relevance, then the subject is about 
this or that.

The smartest categorizer is your users, though. So the best way to make that 
list is to keep track of how close to the top of the search results did a user 
respond to his search results and what were the words, and how many search 
attempts did it take. That's waht netflix does. Their goal is to have users get 
something in theh top three off the first search attempt.

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Erick Erickson 
To: solr-user@lucene.apache.org
Sent: Thu, December 23, 2010 10:00:05 AM
Subject: Re: Item catagorization problem.

What you're asking for appears to me to be "auto-categorization", and
there's nothing built into Solr to do this. Somehow you need to analyze
the documents at index time and add the proper categories, but I have
no clue how. This is especially hard with short fields since most
auto-categorization algorithms try to do some statistical analysis
of the document to figure this out.

Best
Erick

On Thu, Dec 23, 2010 at 8:12 AM, Hasnain  wrote:

>
> Hi all,
>
>   I am using solr in my web application for search purposes. However, i
> am having a problem with the default behaviour of the solr search.
>
> From my understanding, if i query for a keyword, let's say "Laptop",
> preference is given to result rows having more occurences of the search
> keyword "Laptop" in the field "name". This, however, is producing
> undesirable scenarios, for example:
>
> 1. I index an item A with "name" value "Sony Laptop".
> 2. I index another item B with "name" value: "Laptop bags for laptops".
> 3. I search for the keyword "Laptop"
>
> According to the default behaviour, precedence would be given to item B
> since the keyword appears more times in the "name" field for that item.
>
> In my schema, i have another field by the name of "Category" and, for
> example's sake, let's assume that my application supports only two
> categories: computers and accessories. Now, what i require is a mechanism
> to
> assign correct categories to the items during item indexing so that this
> field can be used to better filter the search results, item A would belong
> to "Computer" category and item B would belong to "Accessories" category.
> So
> then, searching for "Laptop" would only look for items in the "Computers"
> category and return item A only.
>
> I would like to point out here that setting the category field manually is
> not an option since the data might be in the vicinity of thousands of
> records. I am not asking for an in-depth algorithm. Just a high level
> design
> would be sufficient to set me in the right direction.
>
> thanks.
>
>
> --
> View this message in context:
>http://lucene.472066.n3.nabble.com/Item-catagorization-problem-tp2136415p2136415.html
>l
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Recap on derived objects in Solr Index, 'schema in a can'

2010-12-22 Thread Dennis Gearon

I think I'm just going to have to have my partner and I play with both cores 
and 
dynamic fields.

If multiple cores are queried, and the schemas match up in order and postion 
for 
the base fields, the 'extra' fields in the different cores just show up in the 
result set with their field names? The query against different cores, with 
'base 
attributes' and 'extended attributes' has to be tailored for each core, right? 
I.E., not querying for fields that don't exist?

(That could be handled by making the query a server side langauge object with 
inheritance for the extended fields)

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Lance Norskog 
To: solr-user@lucene.apache.org
Sent: Wed, December 22, 2010 1:45:04 PM
Subject: Re: Recap on derived objects in Solr Index, 'schema in a can'

A dynamic field just means that the schema allows any field with a
name matching the wildcard. That's all.

There is no support for referring to all of the existing fields in the
wildcard. That is, there is no support for "*_en:word" as a field
search. Nor is there any kind of grouping for facets. The feature for
addressing a particular field in some of the parameters does not
support wildcards. If you add wildcard fields, you have to remember
what they are.

On Wed, Dec 22, 2010 at 11:04 AM, Dennis Gearon  wrote:
> I'm open to cores, if it's the faster(indexing/querying/keeping mentally
> straight) way to do things.
>
> But from what you say below, the eventual goal of the site would mean either 
>100
> extra 'generic' fields, or 1,000-100,000's of cores.
> Probably cores is easier to administer for security and does more accurate
> querying?
>
> What is the relationship between dynamic fields and the schema?
>
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a 
>better
> idea to learn from others’ mistakes, so you do not have to make them yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
>
> - Original Message 
> From: Erick Erickson 
> To: solr-user@lucene.apache.org
> Sent: Wed, December 22, 2010 10:44:27 AM
> Subject: Re: Recap on derived objects in Solr Index, 'schema in a can'
>
> No, one cannot ignore the schema. If you try to add a field not in the
> schema you get
> an error. One could, however, use any arbitrary subset
> of the fields defined in the schema for any particular #document# in the
> index. Say
> your schema had fields f1, f2, f3...f10. You could have fields f1-f5 in one
> doc, and
> fields f6-f10 in another and f1, f4, f9 in another and.
>
> The only field(s) that #must# be in a document are the required="true"
> fields.
>
> There's no real penalty for omitting fields from particular documents. This
> allows
> you to store "special" documents that aren't part of normal searches.
>
> You could, for instance, use a document to store meta-information about your
> index that had whatever meaning you wanted in a field(s) that *no* other
> document
> had. Your app could then read that "special" document and make use of that
> info.
> Searches on "normal" documents wouldn't return that doc, etc.
>
> You could effectively have N indexes contained in one index where a document
> in each logical sub-index had fields disjoint from the other logical
> sub-indexes.
> Why you'd do something like that rather than use cores is a very good
> question,
> but you #could# do it that way...
>
> All this is much different from a database where there are penalties for
> defining
> a large number of unused fields.
>
> Whether doing this is wise or not given the particular problem you're trying
> to
> solve is another discussion ..
>
> Best
> Erick
>
> On Mon, Dec 20, 2010 at 11:03 PM, Dennis Gearon wrote:
>
>> Based on more searches and manual consolidation, I've put together some of
>> the ideas for this already suggested in a summary below. The last item in
>> the
>> summary
>> seems to be interesting, low technical cost way of doing it.
>>
>> Basically, it treats the index like a 'BigTable', a la "No SQL".
>>
>> Erick E

Re: Recap on derived objects in Solr Index, 'schema in a can'

2010-12-22 Thread Dennis Gearon

I'm open to cores, if it's the faster(indexing/querying/keeping mentally 
straight) way to do things.

But from what you say below, the eventual goal of the site would mean either 
100 
extra 'generic' fields, or 1,000-100,000's of cores.
Probably cores is easier to administer for security and does more accurate 
querying?

What is the relationship between dynamic fields and the schema?

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Erick Erickson 
To: solr-user@lucene.apache.org
Sent: Wed, December 22, 2010 10:44:27 AM
Subject: Re: Recap on derived objects in Solr Index, 'schema in a can'

No, one cannot ignore the schema. If you try to add a field not in the
schema you get
an error. One could, however, use any arbitrary subset
of the fields defined in the schema for any particular #document# in the
index. Say
your schema had fields f1, f2, f3...f10. You could have fields f1-f5 in one
doc, and
fields f6-f10 in another and f1, f4, f9 in another and.

The only field(s) that #must# be in a document are the required="true"
fields.

There's no real penalty for omitting fields from particular documents. This
allows
you to store "special" documents that aren't part of normal searches.

You could, for instance, use a document to store meta-information about your
index that had whatever meaning you wanted in a field(s) that *no* other
document
had. Your app could then read that "special" document and make use of that
info.
Searches on "normal" documents wouldn't return that doc, etc.

You could effectively have N indexes contained in one index where a document
in each logical sub-index had fields disjoint from the other logical
sub-indexes.
Why you'd do something like that rather than use cores is a very good
question,
but you #could# do it that way...

All this is much different from a database where there are penalties for
defining
a large number of unused fields.

Whether doing this is wise or not given the particular problem you're trying
to
solve is another discussion ..

Best
Erick

On Mon, Dec 20, 2010 at 11:03 PM, Dennis Gearon wrote:

> Based on more searches and manual consolidation, I've put together some of
> the ideas for this already suggested in a summary below. The last item in
> the
> summary
> seems to be interesting, low technical cost way of doing it.
>
> Basically, it treats the index like a 'BigTable', a la "No SQL".
>
> Erick Erickson pointed out:
> "...but there's absolutely no requirement
> that all documents in SOLR have the same fields..."
>
> I guess I don't have the right understanding of what goes into a Document
> in Solr. Is it just a set of fields, each with it's own independent field
> type
> declaration/id, it's name, and it's content?
>
> So even though there's a schema for an index, one could ignore it and
> jsut throw any other named fields and types and content at document
> addition
> time?
>
> So If I wanted to search on a base set, all documents having it, I could
> then
> additionally filter based on the (might be wrong use of this) dynamic
> fields?
>
>
>
>
>
>
> Origninal Thread that I started:
> 
>
>http://lucene.472066.n3.nabble.com/A-schema-inside-a-Solr-Schema-Schema-in-a-can-tt2103260.html
>l
>
>
>-
>-
>
> Repeat of the problem, (not actual ratios, numbers, i.e. could be WORSE!):
>
>-
>-
>
>
> 1/ Base object of some kind, x number of fields
> 2/ Derived objects representing Divisiion in company, different customer
> bases,
> etc.
>  each having 2 additional, unique fields.
> 3/ Assume 1000 such derived object types
> 4/ A 'flattened' Index would have the x base object fields,
>and 2000 additional fields
>
>
> 
> Solutions Posited
> ---
>
> A/ First thought, muliti-value columns as key pairs.
>  1/ Difficult to access individual items of more than one 'word' length
> for querying in multivalued fields.
>  2/ All sorts of statistical stuff probably wouldn't apply?
>  3/ (James Dayer said:) There

Re: solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) > 0 AND other_criteria

2010-12-22 Thread Dennis Gearon

Have you investigated 'field collapsing'? I believe that it is a least the 
'DISTINCT' part.


 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: dan sutton 
To: solr-user 
Sent: Wed, December 22, 2010 1:29:23 AM
Subject: solr equiv of : SELECT count(distinct(field)) FROM index WHERE 
length(field) > 0 AND other_criteria

Hi,

Is there a way with faceting or field collapsing to do the SQL equivalent of

SELECT count(distinct(field)) FROM index WHERE length(field) > 0 AND
other_criteria

i.e. I'm only interested in the total count not the individual records
and counts.

Cheers,
Dan

Re: Consequences for using multivalued on all fields

2010-12-21 Thread Dennis Gearon

Thanks you for the input. You might have seen my posts about doing a flexible 
schema for derived objects. Sounds like dynamic fields might be the ticket.

We'll be ready to test the idea in about a month, mabye 3 weeks. I'll post a 
comment about it whn it gets there.

I don't know if I would gain anything, but I think that ALL boolean that were 
NOT in the base object but wehre in the derived objects could be put into one 
field and textually positioned key:pairs, at least for searh purposes. 


Since the derived object would have it's own, additional methods, one of those 
methods could be to 'unserialize' the 'boolean column'. In fact, that could be 
a 
base object function - Empty boolean column values just end up not populating 
any extra base object attiributes.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: kenf_nc 
To: solr-user@lucene.apache.org
Sent: Tue, December 21, 2010 6:07:51 AM
Subject: Re: Consequences for using multivalued on all fields


I have about 30 million documents and with the exception of the Unique ID,
Type and a couple of date fields, every document is made of dynamic fields.
Now, I only have maybe 1 in 5 being multi-value, but search and facet
performance doesn't look appreciably different from a fixed schema solution.
I don't do some of the fancier things, highlighting, spell check, etc. And I
use a lot more string or lowercase field types than I do Text (so not as
many fully tokenized fields), that probably helps with performance.

The only disadvantage I know of is dealing with field names at runtime.
Depending on your architecture, you don't really know what your document
looks like until you have it in a result set. For what I'm doing, that isn't
a problem.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Consequences-for-using-multivalued-on-all-fields-tp2125867p2126120.html

Sent from the Solr - User mailing list archive at Nabble.com.

Recap on derived objects in Solr Index, 'schema in a can'

2010-12-20 Thread Dennis Gearon

Based on more searches and manual consolidation, I've put together some of
the ideas for this already suggested in a summary below. The last item in the
summary
seems to be interesting, low technical cost way of doing it.

Basically, it treats the index like a 'BigTable', a la "No SQL".

Erick Erickson pointed out:
"...but there's absolutely no requirement
that all documents in SOLR have the same fields..."

I guess I don't have the right understanding of what goes into a Document
in Solr. Is it just a set of fields, each with it's own independent field type
declaration/id, it's name, and it's content?

So even though there's a schema for an index, one could ignore it and
jsut throw any other named fields and types and content at document addition
time?

So If I wanted to search on a base set, all documents having it, I could then
additionally filter based on the (might be wrong use of this) dynamic fields?

Origninal Thread that I started:

http://lucene.472066.n3.nabble.com/A-schema-inside-a-Solr-Schema-Schema-in-a-can-tt2103260.html

Repeat of the problem, (not actual ratios, numbers, i.e. could be WORSE!):
-

1/ Base object of some kind, x number of fields
2/ Derived objects representing Divisiion in company, different customer bases,
etc.
each having 2 additional, unique fields.
3/ Assume 1000 such derived object types
4/ A 'flattened' Index would have the x base object fields,
and 2000 additional fields

Solutions Posited
---

A/ First thought, muliti-value columns as key pairs.
1/ Difficult to access individual items of more than one 'word' length
for querying in multivalued fields.
2/ All sorts of statistical stuff probably wouldn't apply?
3/ (James Dayer said:) There's also one "gotcha" we've experienced when
searching acrosse
multi-valued fields: SOLR will match across field occurences.
In the example below, if you were to search q=contrib_name:(james
AND smith),
you will get this record back. It matches one name from one
contributor and

another name from a different contributor. This is not what our
users want.

As a work-around, I am converting these to phrase queries with
slop: "james smith"~50 ... Just use a slop # smaller than your
positionIncrementGap

and bigger than the # of terms entered. This will prevent the
cross-field matches

yet allow the words to occur in any order.

The problem with this approach is that Lucene doesn't support
wildcards in phrases
B/ Dynamic fields was suggested, but I am not sure exactly how they
work, and the person who suggested it was not sure it would work,
either.
C/ Different field naming conventions were suggested in field types were
similar.
I can't predict that.
D/ Found this old thread, and i had other suggestions:
1/ Use multiple cores, one for each record type/schema, aggregate them
in
during the query.
2/ Use a fixed number of additional fields X 2. Eatch additional field
is
actually a pair of fields.
The first of the pair gives the colmn name, the second gives the
data.

a) Although I like this, I wonder how many extra fields to use,
b) it was pointed out that relevancy and other statistical
criterial
for queries might suffer.
3/ Index the different objects exactly as they are, i.e. as Erick
Erickson said:
"I'm not entirely sure this is germane, but there's absolutely no
requirement

that all documents in SOLR have the same fields. So it's possible
for
you to

index the "wildly different content" in "wildly different fields"
. Then

searching for screen:LCD would be straightforward."...
Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a
better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

Re: A schema inside a Solr Schema (Schema in a can)

2010-12-20 Thread Dennis Gearon

Here is a thread on this subject that I did not find earlier. Sometimes 
discussion, thought, and 'mulling' in the subconcious gets me better Google 
searches.

http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-td811883.html

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Dennis Gearon 
To: solr-user@lucene.apache.org
Sent: Mon, December 20, 2010 10:19:53 AM
Subject: Re: A schema inside a Solr Schema (Schema in a can)

Thanks James.

So being accurate with fields with fields(mulitvalues) is probably not possible 
using all the currently made analyzers.

- Original Message 
From: "Dyer, James" 
To: "solr-user@lucene.apache.org" 
Sent: Mon, December 20, 2010 7:16:43 AM
Subject: RE: A schema inside a Solr Schema (Schema in a can)

Dennis,

If you need to search a key/value pair, you'll have to put them both in the 
same 

field, somehow.  One way is to re-index them using the key in the fieldname.  
For instance, suppose you have:

contributor:  dyer, james
contributor:  smith, sam
role:  author
role:  editor

...but you want to search only for authors, you could index these again with 
fieldnames like:

contrib_author:  dyer, james
contrib_editor:  smith, sam

Then you would query "q=contributor:smtih" to search all contribtors and 
q=contrib_editor:smith just to get editors.

Another way to do it is to use some type of marker character sequence to define 
the "key" and index it like this:

contributor:  dyer, james __author
contributor:  smith, sam  __editor

then you can query like this:  "q=contributor:"smith __editor"~50 ... to search 
only for editors named Smith.

We are not yet fully developed here on SOLR but we currently use both of these 
approaches using a different search engine.  One nice thing SOLR could add to 
this second approach that is not an option with our other system is the 
possibility of writing a custom analyzer that could maybe take some of the 
complexity out of the app.  Not sure exactly how it'd work though...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Friday, December 17, 2010 6:52 PM
To: solr-user@lucene.apache.org
Subject: RE: A schema inside a Solr Schema (Schema in a can)

So this is a current usable plugin (except for the latest bug)?

And, is it possible to search jwithin ust one key:value pair in a multivalued 
field? 

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 

idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' 

EARTH has a Right To Life,
  otherwise we all die.

--- On Fri, 12/17/10, Ahmet Arslan  wrote:

> From: Ahmet Arslan 
> Subject: RE: A schema inside a Solr Schema (Schema in a can)
> To: solr-user@lucene.apache.org
> Date: Friday, December 17, 2010, 12:47 PM
> > The problem with this approach
> is that Lucene doesn't
> > support wildcards in phrases.  
> 
> With https://issues.apache.org/jira/browse/SOLR-1604 you can
> do that.
> 
> 
> 
>

Re: A schema inside a Solr Schema (Schema in a can)

2010-12-20 Thread Dennis Gearon

Thanks James.

So being accurate with fields with fields(mulitvalues) is probably not possible 
using all the currently made analyzers.

- Original Message 
From: "Dyer, James" 
To: "solr-user@lucene.apache.org" 
Sent: Mon, December 20, 2010 7:16:43 AM
Subject: RE: A schema inside a Solr Schema (Schema in a can)

Dennis,

If you need to search a key/value pair, you'll have to put them both in the 
same 
field, somehow.  One way is to re-index them using the key in the fieldname.  
For instance, suppose you have:

contributor:  dyer, james
contributor:  smith, sam
role:  author
role:  editor

...but you want to search only for authors, you could index these again with 
fieldnames like:

contrib_author:  dyer, james
contrib_editor:  smith, sam

Then you would query "q=contributor:smtih" to search all contribtors and 
q=contrib_editor:smith just to get editors.

Another way to do it is to use some type of marker character sequence to define 
the "key" and index it like this:

contributor:  dyer, james __author
contributor:  smith, sam  __editor

then you can query like this:  "q=contributor:"smith __editor"~50 ... to search 
only for editors named Smith.

We are not yet fully developed here on SOLR but we currently use both of these 
approaches using a different search engine.  One nice thing SOLR could add to 
this second approach that is not an option with our other system is the 
possibility of writing a custom analyzer that could maybe take some of the 
complexity out of the app.  Not sure exactly how it'd work though...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Friday, December 17, 2010 6:52 PM
To: solr-user@lucene.apache.org
Subject: RE: A schema inside a Solr Schema (Schema in a can)

So this is a current usable plugin (except for the latest bug)?

And, is it possible to search jwithin ust one key:value pair in a multivalued 
field? 

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' 

EARTH has a Right To Life,
  otherwise we all die.

--- On Fri, 12/17/10, Ahmet Arslan  wrote:

> From: Ahmet Arslan 
> Subject: RE: A schema inside a Solr Schema (Schema in a can)
> To: solr-user@lucene.apache.org
> Date: Friday, December 17, 2010, 12:47 PM
> > The problem with this approach
> is that Lucene doesn't
> > support wildcards in phrases.  
> 
> With https://issues.apache.org/jira/browse/SOLR-1604 you can
> do that.
> 
> 
> 
>

1 2 3 4 >

1 - 100 of 326 matches

Mail list logo