Re: find all two word phrases that appear in more than one document

2013-09-09 Thread Ali, Saqib
Thanks Alexandre. I looked at the wiki page for the TermsComponent. But I
am not sure if I follow. Do you have an example or some better document?
Thanks! :)


On Mon, Sep 9, 2013 at 8:17 PM, Alexandre Rafalovitch wrote:

> The "phases" are usually called n-grams or shingles.
>
> You can probably use ShingleFilterFactory to create your shingles (possibly
> with outputUnigrams=false) and then use TermsComponent (
> http://wiki.apache.org/solr/TermsComponent) to list the results.
>
> Regards,
>Alex.
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Tue, Sep 10, 2013 at 8:22 AM, Ali, Saqib  wrote:
>
> > Dear Solr Ninjas,
> >
> > We would like to run a query that returns two word phrases that appear in
> > more than one document. So for e.g. take the string "Solr Ninja". Since
> it
> > appears in more than one document in our Solr instance, the query should
> > return that. The query should  find all such phrases from all the
> documents
> > in our Solr instance, by querying for two adjacent word combination
> > (forming a phrase) in the documents that are in the Solr. These two
> > adjacent word combinations should come from the documents in the Solr
> > index.
> >
> > Any ideas on how to write this query?
> >
> > Thanks.
> >
>


find all two word phrases that appear in more than one document

2013-09-09 Thread Ali, Saqib
Dear Solr Ninjas,

We would like to run a query that returns two word phrases that appear in
more than one document. So for e.g. take the string "Solr Ninja". Since it
appears in more than one document in our Solr instance, the query should
return that. The query should  find all such phrases from all the documents
in our Solr instance, by querying for two adjacent word combination
(forming a phrase) in the documents that are in the Solr. These two
adjacent word combinations should come from the documents in the Solr index.

Any ideas on how to write this query?

Thanks.


Re: removing duplicates

2013-08-21 Thread Ali, Saqib
Thanks Aloke and Robert. Can you please give me code/query snippets?
(newbie here)


On Wed, Aug 21, 2013 at 2:31 PM, Aloke Ghoshal  wrote:

> Hi,
>
> Facet by one of the duplicate fields (probably by the numeric field that
> you mentioned) and set facet.mincount=2.
>
> Regards,
> Aloke
>
>
> On Thu, Aug 22, 2013 at 2:44 AM, Ali, Saqib  wrote:
>
> > hello,
> >
> > We have documents that are duplicates i.e. the ID is different, but rest
> of
> > the fields are same. Is there a query that can remove duplicate, and just
> > leave one copy of the document on solr? There is one numeric field that
> we
> > can key off for find duplicates.
> >
> > Please advise.
> >
> > Thanks
> >
>


removing duplicates

2013-08-21 Thread Ali, Saqib
hello,

We have documents that are duplicates i.e. the ID is different, but rest of
the fields are same. Is there a query that can remove duplicate, and just
leave one copy of the document on solr? There is one numeric field that we
can key off for find duplicates.

Please advise.

Thanks


Re: [solr 4.4.0] SPLITSHARD and core autodiscovery

2013-08-02 Thread Ali, Saqib
Dmitry,

That is expected behaviour. You need to manually remove the original core.

Thanks.


On Fri, Aug 2, 2013 at 6:03 AM, Dmitry Kan  wrote:

> Hello list,
>
> I was wondering, if what I see with the split shard a correct behaviour or
> is something wrong.
>
> Following this article:
>
> http://searchhub.org/2013/06/19/shard-splitting-in-solrcloud/
>
> I have issued a low-level core split query:
>
>
> http://localhost:8982/solr/admin/cores?core=core1&action=SPLIT&path=multicore/core11&path=multicore/core12
>
> which has completed successfully. Two new index directories got created
> under example/multicore directory.
>
> What didn't happen is core autodiscovery.  That is, the dashboard page
> still shows the original core core1.
>
> Is this expected or a bug?
>
> On a separate note: after splitting a core to two new cores, how does the
> search routing work in the non SolrCloud mode environment? Is this taken
> care of by Solr (via the original core) or is client side task?
>
> Thanks,
>
> Dmitry
>


Re: uniqueKey: string vs. long integer

2013-08-01 Thread Ali, Saqib
I think I have found an issue with using the long integer for
uniqueKey*— *Document
routing using ! notation will not work with a long integer uniqueKey :(


Thanks Jack and Robi


On Thu, Aug 1, 2013 at 10:05 AM, Petersen, Robert <
robert.peter...@mail.rakuten.com> wrote:

> Hi guys,
>
> We have used an integer as our unique key since solr 1.3 with no problems
> at all.  We never thought of using anything else because our solr unique
> key is based upon our product sku data base field which is defined as an
> integer also.   We're on solr 3.6.1 currently.
>
> Thanks
> Robi
>
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: Thursday, August 01, 2013 9:27 AM
> To: solr-user@lucene.apache.org
> Subject: Re: uniqueKey: string vs. long integer
>
> Although I cringe at the thought of anybody using anything other than a
> string for the unique key for a document, I can't point to any part of Solr
> that will absolutely fail. I wouldn't be surprised if there weren't a few
> nooks and crannies in Solr that might depend on the type of the ID, or at
> least depend on it being able to converted to and from string. I'm not sure
> if SolrCloud has any dependence on the document ID field type.
>
> Could you inquire as to why this third party chose to go with a non-string
> document key? Just curious if they perceived some advantage. I mean, is the
> key used in numeric calculations? Can it be negative? Is it ever sorted?
>
> But as a Solr best practice, I'd advise against it.
>
> -- Jack Krupansky
>
> -Original Message-
> From: Ali, Saqib
> Sent: Thursday, August 01, 2013 12:02 PM
> To: solr-user@lucene.apache.org
> Subject: uniqueKey: string vs. long integer
>
> We have an application that was developed by a third party. It uses
> uniqueKey that is a long integer instead of a string. Will there be any
> repercussions of using a long integer instead of string for the uniqueKey?
>
> Thanks! :)
>
>
>
>


uniqueKey: string vs. long integer

2013-08-01 Thread Ali, Saqib
We have an application that was developed by a third party. It
uses uniqueKey that is a long integer instead of a string. Will there be
any repercussions of using a long integer instead of string for the
uniqueKey?

Thanks! :)


Re: FieldCollapsing issues in SolrCloud 4.4

2013-07-31 Thread Ali, Saqib
Hello Paul,

Can you please explain what you mean by:
"To get the exact number of groups, you need to shard along your grouping
field"

Thanks! :)


On Wed, Jul 31, 2013 at 3:08 AM, Paul Masurel wrote:

> Do you mean you get different results with group=true?
> numFound is supposed returns the number of ungrouped hits.
>
> To get the number of groups, you are expected to set
> set group.ngroups=true.
> Even then, the result will only give you an upperbound
> in a distributed environment.
> To get the exact number of groups, you need to shard along
> your grouping field.
>
> If you have many groups, you may also experience a huge performance
> hit, as the current implementation has been heaviy optimized for low
> number of groups (e.g. e-commerce categories).
>
> Paul
>
>
>
> On Wed, Jul 31, 2013 at 1:59 AM, Ali, Saqib  wrote:
>
> > Hello all,
> >
> > Is anyone experiencing issues with the numFound when using group=true in
> > SolrCloud 4.4?
> >
> > Sometimes the results are off for us.
> >
> > I will post more details shortly.
> >
> > Thanks.
> >
>
>
>
> --
> __
>
>  Masurel Paul
>  e-mail: paul.masu...@gmail.com
>


FieldCollapsing issues in SolrCloud 4.4

2013-07-30 Thread Ali, Saqib
Hello all,

Is anyone experiencing issues with the numFound when using group=true in
SolrCloud 4.4?

Sometimes the results are off for us.

I will post more details shortly.

Thanks.


Using HP SiteScope to monitor individual Solr shards

2013-07-30 Thread Ali, Saqib
We would like to use HP SiteScope to monitor the availability of
the individual Solr shards. Any ideas on how we can do that? Is there a
shard based URL that is a sure shot of knowing that the shard is feeling
healthy?

Thanks! :)


Re: monitor jvm heap size for solrcloud

2013-07-26 Thread Ali, Saqib
You can use SPM (i think):
http://sematext.com/spm/solr-performance-monitoring/


On Fri, Jul 26, 2013 at 1:36 PM, Joshi, Shital  wrote:

> We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes. While
> running stress tests, we want to monitor JVM heap size across 10 nodes. Is
> there a utility which would connect to all nodes' jmx port and display all
> bean details for the cloud?
>
> Thanks!
>
>
>


maximum number of documents per shard?

2013-07-23 Thread Ali, Saqib
still 2.1 billion documents?


Re: zkHost in solr.xml goes missing after SPLITSHARD using Collections API

2013-07-23 Thread Ali, Saqib
Thanks Alan and Shawn. Just installed Solr 4.4, and no longer experiencing
the issue.

Thanks! :)


On Tue, Jul 23, 2013 at 7:21 AM, Shawn Heisey  wrote:

> On 7/23/2013 7:50 AM, Alan Woodward wrote:
> > Can you try upgrading to the just-released 4.4?  Solr.xml persistence
> had all kinds of bugs in 4.3, which should have been fixed now.
>
> The 4.4.0 release has been finalized and uploaded, but the download link
> hasn't been changed yet because the mirror network isn't fully
> synchronized yet.  It is available from many mirrors, but until the
> website download links get changed, there's not yet a direct way to
> access it.
>
> Here's some generic instructions for situations where the new version is
> done, but the official announcement isn't out yet:
>
> http://lucene.apache.org/solr/
>
> 1) Go the the Solr website (URL above) and click on the latest version
> download button, which at this moment is 4.3.1.  Wait for the redirect
> to take you to a mirror list.
>
> 2) Click on one of the mirrors, the best option is usually the one right
> on top that the website chose for you.
>
> 3) When the file list comes up, click the "Parent Directory" link.  If
> this isn't showing, it will most likely be labelled with ".." instead.
>
> 4) If a directory for the new version (in this case 4.4.0) is listed,
> click on it and then click the file that you want to download.
>
> If the new version is not listed, click the Back button on your browser
> twice, then go back to step 2, but this time choose a different mirror.
>
> One last reminder: This only works right before a release is officially
> announced.  These instructions cannot be used while a release is still
> in development.
>
> Thanks,
> Shawn
>
>


zkHost in solr.xml goes missing after SPLITSHARD using Collections API

2013-07-23 Thread Ali, Saqib
Hello all,

Every time I issue a SPLITSHARD using Collections API, the zkHost attribute
in the solr.xml goes missing. I have to manually edit the solr.xml to add
zkHost after every SPLITSHARD.

Any thoughts on what could be causing this?

Thanks.


Re: add to ContributorsGroup - Instructions for setting up SolrCloud on jboss

2013-07-17 Thread Ali, Saqib
Thanks Erick!

I have added the instructions for running SolrCloud on Jboss:
http://wiki.apache.org/solr/SolrCloud%20using%20Jboss

I will refine the instructions further, and also post some screenshots.

Thanks.


On Sun, Jul 14, 2013 at 5:05 AM, Erick Erickson wrote:

> Done, sorry it took so long, hadn't looked at the list in a couple of days.
>
>
> Erick
>
> On Fri, Jul 12, 2013 at 5:46 PM, Ali, Saqib  wrote:
> > username: saqib
> >
> >
> > On Fri, Jul 12, 2013 at 2:35 PM, Ali, Saqib 
> wrote:
> >
> >> Hello,
> >>
> >> Can you please add me to the ContributorsGroup? I would like to add
> >> instructions for setting up SolrCloud using Jboss.
> >>
> >> thanks.
> >>
> >>
>


Re: Where to specify numShards when startup up a cloud setup

2013-07-16 Thread Ali, Saqib
What does the solr.xml look like on the nodes?


On Tue, Jul 16, 2013 at 2:36 PM, Robert Stewart wrote:

> I want to script the creation of N solr cloud instances (on ec2).
>
> But its not clear to me where I would specify numShards setting.
> From documentation, I see you can specify on the "first node" you start
> up, OR alternatively, use the "collections" API to create a new collection
> - but in that case you need first at least one running SOLR instance.  I
> want to push all solr instances with similar configuration onto N instances
> and just run them with some number of shards pre-set somehow.  Where can I
> put numShards configuration setting?
>
> What I want to do:
>
> 1) push solr configuration to zookeeper ensemble using zkCli command-line
> tool.
> 2) create N instances of SOLR running on Ec2, pointing to the same
> zookeeper
> 3) start all SOLR instances which will become a cloud setup with M shards
> (where M
> Currently everything starts up with 1 shards, and N replicas.
>
> I already have one single collection pre-configured.
>


Re: Clearing old nodes from zookeper without restarting solrcloud cluster

2013-07-15 Thread Ali, Saqib
Hello Luis,

I don't think that is possible. If you delete clusterstate.json from
zookeeper, you will need to restart the nodes.. I could be very wrong
about this

Saqib


On Mon, Jul 15, 2013 at 8:50 PM, Luis Carlos Guerrero Covo <
lcguerreroc...@gmail.com> wrote:

> I know that you can clear zookeeper's data directoy using the CLI with the
> clear command, I just want to know if its possible to update the cluster's
> state without wiping everything out. Anyone have any ideas/suggestions?
>
>
> On Mon, Jul 15, 2013 at 11:21 AM, Luis Carlos Guerrero Covo <
> lcguerreroc...@gmail.com> wrote:
>
> > Hi,
> >
> > Is there an easy way to clear zookeeper of all offline solr nodes without
> > restarting the cluster? We are having some stability issues and we think
> it
> > maybe due to the leader querying old offline nodes.
> >
> > thank you,
> >
> > Luis Guerrero
> >
>
>
>
> --
> Luis Carlos Guerrero Covo
> M.S. Computer Engineering
> (57) 3183542047
>


Re: Book contest idea - feedback requested

2013-07-15 Thread Ali, Saqib
Hello Alex,

This sounds like an excellent idea! :)

Saqib


On Mon, Jul 15, 2013 at 8:11 PM, Alexandre Rafalovitch
wrote:

> Hello,
>
> Packt Publishing has kindly agreed to let me run a contest with e-copies of
> my book as prizes:
> http://www.packtpub.com/apache-solr-for-indexing-data/book
>
> Since my book is about learning Solr and targeted at beginners and early
> intermediates, here is what I would like to do. I am asking for feedback on
> whether people on the mailing list like the idea or have specific
> objections to it.
>
> 1) The basic idea is to get Solr users and write and vote on what they find
> hard with Solr, especially in understanding the features (as contrasted
> with just missing ones).
> 2) I'll probably set it up as a User Voice forum, which has all the
> mechanisms for suggesting and voting on ideas. With an easier interface
> than Jira
> 3) The top N voted ideas will get the books as prizes and I will try to
> fix/document/create JIRAs for those issues.
> 4) I am hoping to specifically reach out to the communities where Solr is a
> component and where they don't necessarily hang out on our mailing list. I
> am thinking SolrNet, Drupal, project Blacklight, Cloudera, CrafterCMS,
> SiteCore, Typo3, SunSpot, Nutch. Obviously, anybody and everybody from this
> list would be absolutely welcome to participate as well.
>
> Yes? No? Suggestions?
>
> Also, if you are maintainer of one of the products/services/libraries that
> has Solr in it and want to reach out to your community yourself, I think it
> would be a lot better than If I did it. Contact me directly and I will let
> you know what template/FAQ I want you to include in the announcement
> message when it is ready.
>
> Thank you all in advance for the comments and suggestions.
>
> Regards,
>Alex.
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>


java.lang.OutOfMemoryError: Requested array size exceeds VM limit

2013-07-12 Thread Ali, Saqib
I am getting a java.lang.OutOfMemoryError: Requested array size exceeds VM
limit on certain queries.

Please advise:

19:25:02,632 INFO  [org.apache.solr.core.SolrCore]
(http-oktst1509.company.tld/12.5.105.96:8180-9) [collection1] webapp=/solr
path=/select
params={sort=sent_date+asc&distrib=false&wt=javabin&version=2&rows=2147483647&df=text&fl=id&shard.url=
12.5.105.96:8180/solr/collection1/&NOW=1373675102627&start=0&q=thread_id:1439513570014188310&isShard=true&fq=domain:company.tld+AND+owner:11782344&fsv=true}
hits=1 status=0 QTime=1
19:25:02,637 ERROR [org.apache.solr.servlet.SolrDispatchFilter]
(http-oktst1509.company.tld/12.5.105.96:8180-2)
null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Requested
array size exceeds VM limit
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:670)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
at
org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:169)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:372)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:679)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:931)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit


Re: add to ContributorsGroup - Instructions for setting up SolrCloud on jboss

2013-07-12 Thread Ali, Saqib
username: saqib


On Fri, Jul 12, 2013 at 2:35 PM, Ali, Saqib  wrote:

> Hello,
>
> Can you please add me to the ContributorsGroup? I would like to add
> instructions for setting up SolrCloud using Jboss.
>
> thanks.
>
>


add to ContributorsGroup - Instructions for setting up SolrCloud on jboss

2013-07-12 Thread Ali, Saqib
Hello,

Can you please add me to the ContributorsGroup? I would like to add
instructions for setting up SolrCloud using Jboss.

thanks.


Re: preferred container for running SolrCloud

2013-07-11 Thread Ali, Saqib
Thanks Walter. And the container..


On Thu, Jul 11, 2013 at 7:55 PM, Walter Underwood wrote:

> Embedded Zookeeper is only for dev. Production needs to run a ZK cluster.
>  --wunder
>
> On Jul 11, 2013, at 7:27 PM, Ali, Saqib wrote:
>
> > With the embedded Zookeeper or separate Zookeeper? Also have run into any
> > issues with running SolrCloud on jetty?
> >
> >
> > On Thu, Jul 11, 2013 at 7:01 PM, Saikat Kanjilal  >wrote:
> >
> >> We're running under jetty.
> >>
> >> Sent from my iPhone
> >>
> >> On Jul 11, 2013, at 6:06 PM, "Ali, Saqib" 
> wrote:
> >>
> >>> 1) Jboss
> >>> 2) Jetty
> >>> 3) Tomcat
> >>> 4) Other..
> >>>
> >>> ?
> >>
>
>
>
>
>


Re: preferred container for running SolrCloud

2013-07-11 Thread Ali, Saqib
With the embedded Zookeeper or separate Zookeeper? Also have run into any
issues with running SolrCloud on jetty?


On Thu, Jul 11, 2013 at 7:01 PM, Saikat Kanjilal wrote:

> We're running under jetty.
>
> Sent from my iPhone
>
> On Jul 11, 2013, at 6:06 PM, "Ali, Saqib"  wrote:
>
> > 1) Jboss
> > 2) Jetty
> > 3) Tomcat
> > 4) Other..
> >
> > ?
>


preferred container for running SolrCloud

2013-07-11 Thread Ali, Saqib
1) Jboss
2) Jetty
3) Tomcat
4) Other..

?


SolrCloud on Jboss

2013-07-08 Thread Ali, Saqib
Hello,

Does anyone have step-by-step instructions for running SolrCloud on Jboss?

Thanks


Re: SolrJ and SolrCloud

2013-07-08 Thread Ali, Saqib
Thanks Mark!


On Mon, Jul 8, 2013 at 10:46 AM, Mark Miller  wrote:

>
> On Jul 8, 2013, at 1:40 PM, "Ali, Saqib"  wrote:
>
> > Hello all,
> >
> > We have an app that uses the SolrJ and instantiates using HttpSolrServer.
> >
> > Now that we would like to move to SolrCloud, can we still use the same
> app,
> > or do we HAVE to switch to
> >
> > CloudSolrServer server = new CloudSolrServer("?");
> >
> > right away?
> >
> > Or will point to one instance using HttpSolrServer suffice for now?
>
> Yes, it will.
>
> - Mark
>
> >
> > Thanks.
>
>


SolrJ and SolrCloud

2013-07-08 Thread Ali, Saqib
Hello all,

We have an app that uses the SolrJ and instantiates using HttpSolrServer.

Now that we would like to move to SolrCloud, can we still use the same app,
or do we HAVE to switch to

CloudSolrServer server = new CloudSolrServer("?");

right away?

Or will point to one instance using HttpSolrServer suffice for now?

Thanks.


Re: 2.1billion+ document

2013-07-05 Thread Ali, Saqib
Thanks Jason! That was very helpful.

I read on the solr wiki that:
"Documents must have a unique key and the unique key must be stored
(stored="true" in schema.xml)"

What is this unique key? Is this just a id that we define in the schema.xml
that is unique to all documents? We have something as follows:


Will this suffice?



Thanks.

On Fri, Jul 5, 2013 at 7:45 PM, Jason Hellman <
jhell...@innoventsolutions.com> wrote:

> Saqib:
>
> At the simplest level:
>
> 1)  Source the machine
> 2)  Install Java
> 3)  Install a servlet container of your choice
> 4)  Copy your Solr WAR and conf directories as desired (probably a rough
> mirror of your current single server)
> 5)  Start it up and start sending data there
> 6)  Query both by simply adding:
>  shards=host1/solr/collection,host2/solr/collection
> 7)  Profit
>
> Or, in shorthand:
>
> 1)  Install new Solr instance and start indexing data there
> 2)  Add the shards parameter to your queries with both (or more) servers
> 3)  …
> 4)  Profit
>
> Now…we usually want to be concerned about how to manage the data so that
> we don't send duplicates.  Without SolrCloud it is our responsibility to
> delegate traffic for updates and deletes.  We also like to think a bit more
> about how to take advantage of our lovely parallelism to increase index or
> query time.  We should also consider strategies to isolate domain data to
> single shards so as to allow isolated queries against dedicated data models
> in single shards.
>
> But if you just want to basics, it really is as easy as describe above.
>
> Jason
>
>
> On Jul 5, 2013, at 7:36 PM, "Ali, Saqib"  wrote:
>
> > Hello Otis,
> >
> > I was thinking more in terms of Solr DistributedSearch rather than
> > SolrCloud. I was hoping to add another Solr instance, when the time
> comes.
> > This is a low use application, but with lot of data. Uptime and query
> speed
> > are not of importance. However we would like to be able to index more
> then
> > 2.1 b document when the time comes..
> >
> > Any advise will be highly appreciated.
> >
> >
> > Thanks!!! :)
> > Saqib
> >
> >
> > On Fri, Jul 5, 2013 at 6:23 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com
> >> wrote:
> >
> >> Hi,
> >>
> >> It's a broad question, but it starts with getting a few servers,
> >> putting Solr 4.3.1 on it (soon 4.4), setting up Zookeeper, creating a
> >> Solr Collection (index) with N shards and M replicas, and reindexing
> >> your old data to this new cluster, which you can expand with new nodes
> >> over time.  If you have specific questions...
> >>
> >> Otis
> >> --
> >> Solr & ElasticSearch Support -- http://sematext.com/
> >> Performance Monitoring -- http://sematext.com/spm
> >>
> >>
> >>
> >> On Fri, Jul 5, 2013 at 8:42 PM, Ali, Saqib 
> wrote:
> >>> Question regarding the 2.1 billion+ document.
> >>>
> >>> I understand that a single instance of solr has a limit of 2.1 billion
> >>> documents.
> >>>
> >>> We currently have a single solr server. If we reach 2.1billion
> documents
> >>> limit, what is involved in moving to the Solr DistributedSearch?
> >>>
> >>> Thanks! :)
> >>
>
>


Re: 2.1billion+ document

2013-07-05 Thread Ali, Saqib
Hello Otis,

I was thinking more in terms of Solr DistributedSearch rather than
SolrCloud. I was hoping to add another Solr instance, when the time comes.
This is a low use application, but with lot of data. Uptime and query speed
are not of importance. However we would like to be able to index more then
2.1 b document when the time comes..

Any advise will be highly appreciated.


Thanks!!! :)
Saqib


On Fri, Jul 5, 2013 at 6:23 PM, Otis Gospodnetic  wrote:

> Hi,
>
> It's a broad question, but it starts with getting a few servers,
> putting Solr 4.3.1 on it (soon 4.4), setting up Zookeeper, creating a
> Solr Collection (index) with N shards and M replicas, and reindexing
> your old data to this new cluster, which you can expand with new nodes
> over time.  If you have specific questions...
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>
>
>
> On Fri, Jul 5, 2013 at 8:42 PM, Ali, Saqib  wrote:
> > Question regarding the 2.1 billion+ document.
> >
> > I understand that a single instance of solr has a limit of 2.1 billion
> > documents.
> >
> > We currently have a single solr server. If we reach 2.1billion documents
> > limit, what is involved in moving to the Solr DistributedSearch?
> >
> > Thanks! :)
>


2.1billion+ document

2013-07-05 Thread Ali, Saqib
Question regarding the 2.1 billion+ document.

I understand that a single instance of solr has a limit of 2.1 billion
documents.

We currently have a single solr server. If we reach 2.1billion documents
limit, what is involved in moving to the Solr DistributedSearch?

Thanks! :)


Re: [Announcement] Norch- a search engine for node.js

2013-07-05 Thread Ali, Saqib
Very interesting. What is the upper limit on the number of documents?

Thanks! :)


On Fri, Jul 5, 2013 at 11:53 AM, Fergus McDowall
wrote:

> Here is some news that might be of interest to users and implementers of
> Solr
>
>
> http://blog.comperiosearch.com/blog/2013/07/05/norch-a-search-engine-for-node-js/
>
> Norch (http://fergiemcdowall.github.io/norch/) is a search engine written
> for Node.js. Norch uses the Node search-index module which is in turn
> written using the super fast levelDB library that Google open-sourced in
> 2011.
>
> The aim of Norch is to make a simple, fast search server, that requires
> minimal configuration to set up. Norch sacrifices complex functionality for
> a limited robust feature set, that can be used to set up a free test search
> engine for most enterprise scenarios.
>
> Currently Norch features
>
> Full text search
> Stopword removal
> Faceting
> Filtering
> Relevance weighting (tf-idf)
> Field weighting
> Paging (offset and resultset length)
>
> Norch can index any data that is marked up in the appropriate JSON format
>
> Download the first release of Norch (0.2.1) here (
> https://github.com/fergiemcdowall/norch/releases)
>


solrj distributed solr example

2013-07-05 Thread Ali, Saqib
Hello all,

Can anyone please share a solrj example for distributed solr?

Thanks! :)


Re: Moving from single Solr instance to Solr Cloud

2013-07-04 Thread Ali, Saqib
Hello Furkan,

We are using Solr 4.3

Thanks


On Thu, Jul 4, 2013 at 1:43 AM, Furkan KAMACI wrote:

> Which version of Solr you are using?
>
> 2013/7/4 Ali, Saqib 
>
> > We have single Solr instance with lot of indexed document. Now we would
> > like to move to SolrCloud implementation.
> >
> > Can we move the existing index to SolrCloud? If so, how? Or do we need to
> > reindex our data in SolrCloud?
> >
> > Thanks,
> > Saqib
> >
>


Re: omitTermFreqAndPositions="true" in easy English, please?

2013-07-03 Thread Ali, Saqib
so in this case since the field type is String, adding
omitTermFreqAndPositions="true" does really help in reducing the index size?





On Wed, Jul 3, 2013 at 10:00 PM, Jack Krupansky wrote:

> Oops... I wasn't reading carefully enough - frequencies and positions only
> relate to tokenized fields (text) - not string fields.
>
> That doesn't impact your ability to do AND and OR of discrete string terms
> of a multivalued string field.
>
> -- Jack Krupansky
>
> -Original Message- From: Jack Krupansky
> Sent: Thursday, July 04, 2013 12:54 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: omitTermFreqAndPositions="**true" in easy English, please?
>
> Yes, but it is simply doing an AND or OR of the individual terms - no
> phrases or implied ordering of the terms.
>
> -- Jack Krupansky
>
> -Original Message- From: Ali, Saqib
> Sent: Thursday, July 04, 2013 12:52 AM
> To: solr-user@lucene.apache.org
> Subject: Re: omitTermFreqAndPositions="**true" in easy English, please?
>
> Jack,
>
> Thanks for the explanation! :
>
> We have a multi-value field as following:
>  multiValued="true"/>
>
> Most of these labels are two or more letter phrase e.g.
> 1) Google Reader
> 2) Google Mail
> 3) Google Cloud Storage
>
> etc. etc.
>
> if we add omitTermFreqAndPositions="**true" to this field:
>  multiValued="true" omitTermFreqAndPositions="**true"/>
>
> Will we be able to execute queries like:
> label: (Google Cloud Storage) ?
>
> Thanks.
>
>
>
>
> On Wed, Jul 3, 2013 at 8:23 PM, Jack Krupansky
> **wrote:
>
>  If you have a text field and simply want to be able to query whether
>> individual terms are present in the text without needing to know either
>> how
>> frequently the terms occur or that some terms may be in present in
>> phrases.
>> So, you can do AND and OR for individual terms in that field, but not
>> phrases, and there is no scoring difference whether a term occurs once or
>> a
>> thousand times in that field for each document. A lot less information
>> needs to be stored in the index.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Ali, Saqib
>> Sent: Wednesday, July 03, 2013 10:31 PM
>> To: solr-user@lucene.apache.org
>> Subject: omitTermFreqAndPositions="true" in easy English, please?
>>
>>
>> Hello,
>>
>> Can anyone please explain omitTermFreqAndPositions="true" to me in
>> easy
>> English, please?
>>
>> Thanks.
>>
>>


Re: omitTermFreqAndPositions="true" in easy English, please?

2013-07-03 Thread Ali, Saqib
sorry change the query to:
label:  (Google AND Cloud AND Storage)

or will Solr add AND / OR behind the scenes?


On Wed, Jul 3, 2013 at 9:59 PM, Ali, Saqib  wrote:

> So do I have to change my query to
> label: (Google Cloud Storage) ?
>
> or will Solr add AND / OR behind the scenes?
>
>
> On Wed, Jul 3, 2013 at 9:54 PM, Jack Krupansky wrote:
>
>> Yes, but it is simply doing an AND or OR of the individual terms - no
>> phrases or implied ordering of the terms.
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Ali, Saqib
>> Sent: Thursday, July 04, 2013 12:52 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: omitTermFreqAndPositions="**true" in easy English, please?
>>
>>
>> Jack,
>>
>> Thanks for the explanation! :
>>
>> We have a multi-value field as following:
>> > multiValued="true"/>
>>
>> Most of these labels are two or more letter phrase e.g.
>> 1) Google Reader
>> 2) Google Mail
>> 3) Google Cloud Storage
>>
>> etc. etc.
>>
>> if we add omitTermFreqAndPositions="**true" to this field:
>> > multiValued="true" omitTermFreqAndPositions="**true"/>
>>
>> Will we be able to execute queries like:
>> label: (Google Cloud Storage) ?
>>
>> Thanks.
>>
>>
>>
>>
>> On Wed, Jul 3, 2013 at 8:23 PM, Jack Krupansky *
>> *wrote:
>>
>>  If you have a text field and simply want to be able to query whether
>>> individual terms are present in the text without needing to know either
>>> how
>>> frequently the terms occur or that some terms may be in present in
>>> phrases.
>>> So, you can do AND and OR for individual terms in that field, but not
>>> phrases, and there is no scoring difference whether a term occurs once
>>> or a
>>> thousand times in that field for each document. A lot less information
>>> needs to be stored in the index.
>>>
>>> -- Jack Krupansky
>>>
>>> -Original Message- From: Ali, Saqib
>>> Sent: Wednesday, July 03, 2013 10:31 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: omitTermFreqAndPositions="true" in easy English, please?
>>>
>>>
>>> Hello,
>>>
>>> Can anyone please explain omitTermFreqAndPositions="true" to me in
>>> easy
>>> English, please?
>>>
>>> Thanks.
>>>
>>>
>>
>


Re: omitTermFreqAndPositions="true" in easy English, please?

2013-07-03 Thread Ali, Saqib
So do I have to change my query to
label: (Google Cloud Storage) ?

or will Solr add AND / OR behind the scenes?


On Wed, Jul 3, 2013 at 9:54 PM, Jack Krupansky wrote:

> Yes, but it is simply doing an AND or OR of the individual terms - no
> phrases or implied ordering of the terms.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Ali, Saqib
> Sent: Thursday, July 04, 2013 12:52 AM
> To: solr-user@lucene.apache.org
> Subject: Re: omitTermFreqAndPositions="**true" in easy English, please?
>
>
> Jack,
>
> Thanks for the explanation! :
>
> We have a multi-value field as following:
>  multiValued="true"/>
>
> Most of these labels are two or more letter phrase e.g.
> 1) Google Reader
> 2) Google Mail
> 3) Google Cloud Storage
>
> etc. etc.
>
> if we add omitTermFreqAndPositions="**true" to this field:
>  multiValued="true" omitTermFreqAndPositions="**true"/>
>
> Will we be able to execute queries like:
> label: (Google Cloud Storage) ?
>
> Thanks.
>
>
>
>
> On Wed, Jul 3, 2013 at 8:23 PM, Jack Krupansky **
> wrote:
>
>  If you have a text field and simply want to be able to query whether
>> individual terms are present in the text without needing to know either
>> how
>> frequently the terms occur or that some terms may be in present in
>> phrases.
>> So, you can do AND and OR for individual terms in that field, but not
>> phrases, and there is no scoring difference whether a term occurs once or
>> a
>> thousand times in that field for each document. A lot less information
>> needs to be stored in the index.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Ali, Saqib
>> Sent: Wednesday, July 03, 2013 10:31 PM
>> To: solr-user@lucene.apache.org
>> Subject: omitTermFreqAndPositions="true" in easy English, please?
>>
>>
>> Hello,
>>
>> Can anyone please explain omitTermFreqAndPositions="true" to me in
>> easy
>> English, please?
>>
>> Thanks.
>>
>>
>


Re: omitTermFreqAndPositions="true" in easy English, please?

2013-07-03 Thread Ali, Saqib
Jack,

Thanks for the explanation! :

We have a multi-value field as following:


Most of these labels are two or more letter phrase e.g.
1) Google Reader
2) Google Mail
3) Google Cloud Storage

etc. etc.

if we add omitTermFreqAndPositions="true" to this field:


Will we be able to execute queries like:
label: (Google Cloud Storage) ?

Thanks.




On Wed, Jul 3, 2013 at 8:23 PM, Jack Krupansky wrote:

> If you have a text field and simply want to be able to query whether
> individual terms are present in the text without needing to know either how
> frequently the terms occur or that some terms may be in present in phrases.
> So, you can do AND and OR for individual terms in that field, but not
> phrases, and there is no scoring difference whether a term occurs once or a
> thousand times in that field for each document. A lot less information
> needs to be stored in the index.
>
> -- Jack Krupansky
>
> -Original Message- From: Ali, Saqib
> Sent: Wednesday, July 03, 2013 10:31 PM
> To: solr-user@lucene.apache.org
> Subject: omitTermFreqAndPositions="**true" in easy English, please?
>
>
> Hello,
>
> Can anyone please explain omitTermFreqAndPositions="**true" to me in easy
> English, please?
>
> Thanks.
>


Re: Use case indexed="false" stored="false" field

2013-07-03 Thread Ali, Saqib
Thank you Shawn for the excellent use case. :)


On Wed, Jul 3, 2013 at 9:34 AM, Shawn Heisey  wrote:

> On 7/3/2013 9:22 AM, Ali, Saqib wrote:
>
>> What would be the use case for such a field:
>>
>>  > stored="false"/>
>>
>>
>> and
>>
>>  > stored="false"/>
>>
>
> I have a field like this in my schema. That field is used as one of the
> source fields that get copied to my "catchall" field.  I don't need the
> field by itself, but I use it in conjunction with other fields.
>
> If I can get the app developers to switch over to using edismax more, I
> will get rid of the catchall field and then set that field to indexed and
> not stored.
>
> Thanks,
> Shawn
>
>


Re: unused fields in Solr schema.xml increase the index size

2013-07-03 Thread Ali, Saqib
Thanks Jacks! That was very helpful.


On Wed, Jul 3, 2013 at 9:54 AM, Jack Krupansky wrote:

> If never used, they take up zero space in the index.
>
> If they were used but are no longed used, they're still there, but any new
> or replaced documents will not take up any space for the unused fields
> (subject to the facet that deleted fields still exist until a
> merge/optimize compresses them away.)
>
> But, yes, should should try to keep your schema clean - but if the fields
> are still populated in some of the documents, you might eventually find
> some need to reference them.
>
> You should keep your schema and config files in a version control system
> so that you can always go back or view differences.
>
> -- Jack Krupansky
>
> -Original Message- From: Ali, Saqib
> Sent: Wednesday, July 03, 2013 11:55 AM
> To: solr-user@lucene.apache.org
> Subject: unused fields in Solr schema.xml increase the index size
>
>
> Hello all,
>
> Do unused fields in Solr Schem.xml increase the size of the index files?
>
> Should we be cleaning up those fields?
>
> Thanks.
>
> Saqib
>


omitTermFreqAndPositions="true" in easy English, please?

2013-07-03 Thread Ali, Saqib
Hello,

Can anyone please explain omitTermFreqAndPositions="true" to me in easy
English, please?

Thanks.


Moving from single Solr instance to Solr Cloud

2013-07-03 Thread Ali, Saqib
We have single Solr instance with lot of indexed document. Now we would
like to move to SolrCloud implementation.

Can we move the existing index to SolrCloud? If so, how? Or do we need to
reindex our data in SolrCloud?

Thanks,
Saqib


unused fields in Solr schema.xml increase the index size

2013-07-03 Thread Ali, Saqib
Hello all,

Do unused fields in Solr Schem.xml increase the size of the index files?

Should we be cleaning up those fields?

Thanks.

Saqib


Re: Use case indexed="false" stored="false" field

2013-07-03 Thread Ali, Saqib
very interesting. thank you all for the explanation!!! :)


On Wed, Jul 3, 2013 at 8:32 AM, Jack Krupansky wrote:

> Setting both indexed and stored to false means to ignore input values for
> that field.
>
> The effective use case is that these fields may have values in the update
> input stream and they will be ignored. Without these field definitions,
> those same field values would cause exceptions - references to undefined
> fields. In other words, you are telling Solr that it is okay to have inputs
> for these fields - simply ignore them.
>
> But... you could still have update processors that look at the values of
> "ignored" fields and maybe assigns them to other, non-ignored fields.
>
> -- Jack Krupansky
>
> -Original Message- From: Ali, Saqib
> Sent: Wednesday, July 03, 2013 11:22 AM
> To: solr-user@lucene.apache.org
> Subject: Use case indexed="false" stored="false" field
>
>
> Hello all,
>
>
> What would be the use case for such a field:
>
> stored="false"/>
>
>
> and
>
>
>
>
> ?
>
>
> Thanks.
>


Use case indexed="false" stored="false" field

2013-07-03 Thread Ali, Saqib
Hello all,


What would be the use case for such a field:




and




?


Thanks.


Re: copyField and storage requirements

2013-07-02 Thread Ali, Saqib
Thanks Shawn.

Here is the text_general type definition. We would like to bring down the
storage requirement down to a minimum for those 500KB content documents. We
just need basic full-text search.

Thanks!!! :)





















On Tue, Jul 2, 2013 at 11:35 AM, Shawn Heisey  wrote:

> On 7/2/2013 12:22 PM, Ali, Saqib wrote:
> > Newbie question:
> >
> > We have the following fields defined in the schema:
> >
> > 
> > 
> > 
> >
> > the content is field is about 500KB data.
> >
> > My question is whether Solr stores the entire contents of the that 500KB
> > content field?
> >
> > We want to minimize the stored data in the Solr index, that is why we
> added
> > the copyField teaser.
>
> With that config, the entire 500KB will not be _stored_ .. but it will
> affect the index size because you are indexing it.  Exactly what degree
> that will be depends on the definition of the text_general type.
>
> Thanks,
> Shawn
>
>


copyField and storage requirements

2013-07-02 Thread Ali, Saqib
Newbie question:

We have the following fields defined in the schema:





the content is field is about 500KB data.

My question is whether Solr stores the entire contents of the that 500KB
content field?

We want to minimize the stored data in the Solr index, that is why we added
the copyField teaser.

Thanks
Saqib


Re: Storing Solr Index on NFS

2013-04-15 Thread Ali, Saqib
Hello Walter,

Thanks for the response. That has been my experience in the past as well.
But I was wondering if there new are things in Solr 4 and NFS 4.1 that make
the storing of indexes on a NFS mount feasible.

Thanks,
Saqib


On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwood wrote:

> On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote:
>
> > Greetings,
> >
> > Are there any issues with storing Solr Indexes on a NFS share? Also any
> > recommendations for using NFS for Solr indexes?
>
> I recommend that you do not put Solr indexes on NFS.
>
> It can be very slow, I measured indexing as 100X slower on NFS a few years
> ago.
>
> It is not safe to share Solr index files between two Solr servers, so
> there is no benefit to NFS.
>
> wunder
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>


Storing Solr Index on NFS

2013-04-15 Thread Ali, Saqib
Greetings,

Are there any issues with storing Solr Indexes on a NFS share? Also any
recommendations for using NFS for Solr indexes?

Thanks,
Saqib


Re: secure deployment of solr.war on jboss

2013-04-01 Thread Ali, Saqib
Thanks. Are you using IP tables firewall on the jboss to prevent access
from other systems? Or are you using some jboss configuration for that?

Thanks,
Saqib


On Mon, Apr 1, 2013 at 6:25 AM, adityab  wrote:

> Hi Ali,
>
> We have Solr 4.2 on Jboss running on a separate VM behind firewall. Only IT
> Administration and our FrontEnd Application Server is able to access the
> Solr servers in production.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/secure-deployment-of-solr-war-on-jboss-tp4052754p4052899.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


secure deployment of solr.war on jboss

2013-03-31 Thread Ali, Saqib
Hello all,

We are using Apache Solr 4.2 in our application to provide search
capabilities. We are deploying the solr.war file to jboss along with our
application.

Any suggestions on proper security controls for this type of solr setup?

Also solr is now accessible to everyone from the
http://jboss_host/solrURL. How can we prevent /solr/ being accessible
by all IP addresses? We
would like to restrict to certain IP addresses namely the jboss_host and
couple of other management API hosts.

Any help will be much appreciated.

Thanks,
Saqib


Re: What is the graceful shutdown API for Solrj embedded?

2013-02-07 Thread Ali, Saqib
Hello Alex,

I asked a similar question on server fault:
http://serverfault.com/a/474442/156440


On Wed, Feb 6, 2013 at 7:05 PM, Alexandre Rafalovitch wrote:

> Hello,
>
> When I CTRL-C the example Solr, it prints a bunch of graceful shutdown
> messages.  I assume it shuts down safe and without corruption issues.
>
> When I do that to Solrj (embedded, not remote), it just drops dead.
>
> I found CoreContainer.shutdown(), which looks about right and does
> terminate Solrj but it prints out a completely different set of messages.
>
> Is CoreContainer.shutdown() the right method for Solrj (4.1)? Is there more
> than just one call?
>
> And what happens if you just Ctrl-C Solrj instance? Wiki says nothing about
> shutdown, so I can imagine a lot of people probably think it is ok to just
> kill it. Is there a danger of corruption?
>
> Regards,
> Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>


Re: Configuring the jetty shipped with Solr

2013-02-05 Thread Ali, Saqib
Thanks Alex.

I was able to bind jetty to 127.0.0.1 so that it only accepts connections
from localhost using the following:

But how I do set it so that it can accept connections from certain
non-localhost IP addresses as well?

Thanks.



On Mon, Feb 4, 2013 at 5:06 PM, Alexandre Rafalovitch wrote:

> I believe, for the example directory (as in relative to start.jar),
> contexts directory has the url mapping to solr (/solr), etc has some global
> jetty properties and solr-webapp/webapp/WEB-INF contains some Solr's
> specific jetty configuration.
>
> Beware that the last one however is a decompressed version of
> webapps/solr.war. I don't know if it ever gets overriden after the first
> time it is decompressed or not.
>
> No idea where the actual IP address directive is, though.
>
> Regards,
>Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Mon, Feb 4, 2013 at 6:41 PM, Ali, Saqib  wrote:
>
> > Hello all,
> >
> > How do I change the configuration for the Jetty that is shipped with
> Apache
> > Solr? Where are the configuration files located? I want to restrict the
> IP
> > address that can connect to that instance of Solr
> >
> > Thanks,
> > Saqib
> >
>


Configuring the jetty shipped with Solr

2013-02-04 Thread Ali, Saqib
Hello all,

How do I change the configuration for the Jetty that is shipped with Apache
Solr? Where are the configuration files located? I want to restrict the IP
address that can connect to that instance of Solr

Thanks,
Saqib