Re: Time of insert

2017-02-06 Thread Mahmoud Almokadem
Thanks Alex for your reply. But the field created_date will be updated
every time the document inserted to the solr. I want to record the first
time the document indexed to solr and I'm using DataImport handler.

And I tried solr.TimestampUpdateProcessorFactory but I got
NullPointerException, So I changed it to use default value for the field on
the schema

  



but this field contains the last update of the document not the first time
the document inserted.


Thanks,
Mahmoud

On Tue, Feb 7, 2017 at 12:10 AM, Alexandre Rafalovitch 
wrote:

> If you are reindexing full documents, there is no way.
>
> If you are actually doing updates using Solr updates XML/JSON, then
> you can have a created_date field with default value of NOW.
> Similarly, you could probably do something with UpdateRequestProcessor
> chains to get that NOW added somewhere.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 6 February 2017 at 15:32, Mahmoud Almokadem 
> wrote:
> > Hello,
> >
> > I'm using dih on solr 6 for indexing data from sql server. The document
> can
> > br indexed many times according to the updates on it. Is that available
> to
> > get the first time the document inserted to solr?
> >
> > And how to get the dates of the document updated?
> >
> > Thanks for help,
> > Mahmoud
>


Re: How to combine third party search data as top results ?

2017-02-06 Thread shamik
Charlie, this looks something very close to what I'm looking for. Just
wondering if you've made this available as a jar or can be build from
source? Our Solr distribution is not built from source, I can only use an
external jar. I'll appreciate if you can let me know.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-combine-third-party-search-data-as-top-results-tp4318116p4319101.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Switching from Managed Schema to Manually Edited schema.xml --IS NOT WORKING

2017-02-06 Thread Erick Erickson
You did not answer whether you uploaded your configs to Zookeeper and
reloaded the collection. Providing configs will not help you with
that.

What I'd advise:

First get it working in stand-alone mode without Solr cloud at all.
That should be quite simple, all on your local machine. Then migrate
to SolrCloud so you're only changing one thing at a time.

Best,
Erick

On Mon, Feb 6, 2017 at 9:54 AM, Anatharaman, Srinatha (Contractor)
 wrote:
> Erick,
>
> I did as mentioned in that URL, made changes to solrconfig and kept only 
> required fields in schema.xml
> Would you mind sharing config files for indexing text document?
>
> Regards,
> ~Sri
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Monday, February 06, 2017 12:22 AM
> To: solr-user 
> Subject: Re: Switching from Managed Schema to Manually Edited schema.xml --IS 
> NOT WORKING
>
> This is still using the managed schema specifically the data_driven_configs 
> schema as evidenced by the add-unknown-field-to-the-schema part of the URL.
>
> It looks like you're not _really_ removing the managed schema definitions 
> from your solrconfig.xml. You must
> 1> change solrconfig.xml
> 2> push it to ZooKeeper
> 3> reload the collection
>
> before the config changes actually take effect.
>
> Best,
> Erick
>
> On Sun, Feb 5, 2017 at 9:05 PM, Anatharaman, Srinatha (Contractor) 
>  wrote:
>> Hi ,
>>
>> I am indexing a Text document and followed the steps defined in below
>> URL to create the schema.xml
>> https://cwiki.apache.org/confluence/display/solr/Schema+Factory+Defini
>> tion+in+SolrConfig#SchemaFactoryDefinitioninSolrConfig-SwitchingfromMa
>> nagedSchematoManuallyEditedschema.xml
>>
>> After making above changes, When I try to index the document using curl 
>> command I get below error :
>>
>>   > name="responseHeader">400> name="QTime">147> name="metadata">> name="error-class">org.apache.solr.common.SolrException> name="root-error-class">org.apache.solr.common.SolrException> name="error-class">org.apache.solr.update.processor.DistributedUpdateP
>> rocessor$DistributedUpdatesAsyncException> name="root-error-class">org.apache.solr.update.processor.DistributedUp
>> dateProcessor$DistributedUpdatesAsyncException> name="msg">Async exception during distributed update: Bad Request
>>
>>
>>
>> request:
>> http://165.137.46.219:8983/solr/gsearch_shard1_replica2/update?update.
>> chain=add-unknown-fields-to-the-schemaupdate.distrib=TOLEADER
>> ;distrib.from=http%3A%2F%2F165.137.46.218%3A8983%2Fsolr%2Fgsearch_shar
>> d2_replica1%2Fwt=javabinversion=2> name="code">400 
>>
>> Could someone help me to resolve this issue, How do I create a
>> schema.xml file for a text document(document content varies for each
>> files). I want to index entire document as whole and search on the
>> document content
>>
>> Thanks & Regards,
>> ~Sri
>>
>>
>


Re: Issues with uniqueKey != id?

2017-02-06 Thread Erik Hatcher
Personally I'd leave it as "id" - and adjust your other domain specific field 
name to something else.  Why?   Keep Solr and other potential tools from having 
issues.  I don't know exactly what may break, but I'd rather keep things 
straightforward. 

   Erik

> On Feb 6, 2017, at 02:33, Matthias X Falkenberg  wrote:
> 
> Hi Susheel,
> 
> My question is about the name of the "uniqueKey" field rather than the 
> composition of its values. By default, Solr uses a field with the name 
> "id". For reasons of ambiguity with the applications in my environment, I 
> am considering to change the field name to, for example, "docId". Is that 
> what you have also done for your compound keys?
> 
> One important aspect to consider after using a "uniqueKey" with a 
> different name is 
> http://lucene.apache.org/solr/6_3_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html
> : "This class assumes the id field for your documents is called 'id' - if 
> this is not the case, you must set the right name with 
> setIdField(String)."
> 
> I am wondering whether there are more details or pitfalls that I should be 
> aware of?
> 
> Mit freundlichen Grüßen / Kind regards,
> 
> Matthias Falkenberg
> 
> Team Lead - IBM Digital Experience Development
> IBM Watson Content Hub, IBM WebSphere Portal, IBM Web Content Manager
> IBM Deutschland Research & Development GmbH / Vorsitzende des 
> Aufsichtsrats: Martina Koederitz
> Geschäftsführung: Dirk Wittkopp
> Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, 
> HRB 243294
> 
> 
> 
> From:   Susheel Kumar 
> To: solr-user@lucene.apache.org
> Date:   05-02-17 03:21 AM
> Subject:Re: Issues with uniqueKey != id?
> 
> 
> 
> Hello,
> 
> So far in my experience haven't come across scenario where unique key/id 
> is
> not required.  Most of the times, I have put combination of few fields
> like aggregate
> or compound keys .  (e.g.
> organization_id + employee_id etc.).  The reason it makes sense to have
> some form of unique key is two fold
> a) if there is no unique key, it kind of become impossible to update any
> existing records since you can't uniquely identify them which means your
> index will keep growing
> b)  If no unique key then when you return search results, you wouldn't 
> have
> anything to relate with other/external system
> 
> Sometime you may have time-series data in which case may be timestamp or
> combination of timestamp / other field may make sense  but yes Unique key
> is not mandatory.
> 
> Thanks,
> Susheel
> 
> On Fri, Feb 3, 2017 at 11:49 AM, Matthias X Falkenberg 
> 
> wrote:
> 
>> Howdy,
>> 
>> In the Solr Wiki I stumbled upon a somewhat vague statement on the
>> uniqueKey:
>> 
>>> https://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field
>>> It shouldn't matter whether you rename this to something else (and
>> change the  value), but occasionally it has in the past. We
>> recommend that you just leave this definition alone.
>> 
>> I'd be very grateful for any positive or negative experiences with
>> "uniqueKey" not being set to "id" - especially if your experiences are
>> related to Solr 6.2.1+.
>> 
>> Many thanks,
>> 
>> Matthias Falkenberg
>> 
>> IBM Deutschland Research & Development GmbH / Vorsitzende des
>> Aufsichtsrats: Martina Koederitz
>> Geschäftsführung: Dirk Wittkopp
>> Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht 
> Stuttgart,
>> HRB 243294
>> 
>> 
> 
> 
> 
> 


Re: Find groups where at least one item matches a query

2017-02-06 Thread Joel Bernstein
Assuming you have a unique id for each document the graph expression below
will get you what you're looking for.

The nodes function is short for gatherNodes described in the docs (
https://cwiki.apache.org/confluence/display/solr/Graph+Traversal). Starting
in 6.4 you can call the function "nodes" or "gatherNodes".

nodes(collection,
   nodes(collection,
  walk="Normal->pathology",
  gather="groupId"),
   walk="node->groupId",
   gather="id",
   trackTraversal="true")


The inner nodes function retrieves all the groupId's where Normal is the
pathology. The outer nodes function takes the groupId's emitted from the
inner nodes function and retrieves all the document id's in these groups.
trackTraversal=true on the outer nodes function will include the "groupId"
in the ancestor field of each doc id.

You'll find that when master the graph expression syntax you'll be able to
do all kinds of interesting graph queries on the data set you've described,
which is really a best treated as a graph.



Joel Bernstein
http://joelsolr.blogspot.com/

On Sun, Feb 5, 2017 at 10:24 PM, Joel Bernstein  wrote:

> Take a look at the graph expressions:
>
> https://cwiki.apache.org/confluence/display/solr/Graph+Traversal
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Sun, Feb 5, 2017 at 3:43 PM, Alexandre Rafalovitch 
> wrote:
>
>> What about collapse and expand with overriden query. Something like
>> this (against 6.4 techproducts example):
>> http://localhost:8983/solr/techproducts/select?expand.q=*:*;
>> expand=true={!collapse%20field=manu_id_s}=on=
>> name:CORSAIR=json
>> 
>>
>> Note that the main document area contains the head document and the
>> expanded area contains the rest of them, up to provided/default limit.
>> For further info, see
>> https://cwiki.apache.org/confluence/display/solr/Collapse+
>> and+Expand+Results
>>
>> Regards,
>>Alex.
>> 
>> http://www.solr-start.com/ - Resources for Solr users, new and
>> experienced
>>
>>
>> On 5 February 2017 at 14:55, Cristian Popovici
>>  wrote:
>> > Doesn't seem to work - I'm doing a query like this and I get only one
>> result
>> >
>> > q=pathology:normal=true=groupId&*group.limit=2*
>> >
>> > On Sun, Feb 5, 2017 at 7:20 PM, Nick Vasilyev > >
>> > wrote:
>> >
>> >> Check out the group.limit argument.
>> >>
>> >> On Feb 5, 2017 12:10 PM, "Cristian Popovici" <
>> cristi.popov...@visionsr.com
>> >> >
>> >> wrote:
>> >>
>> >> > Erick, thanks for you answer.
>> >> >
>> >> > Sorry - I forgot to mention that I do not know the group id when I
>> >> perform
>> >> > the query.
>> >> > Grouping - I think - does not help for me as it filters out the
>> documents
>> >> > that do not meet the filter criteria.
>> >> >
>> >> > Example:
>> >> > *q=pathology:Normal=true=groupId*  will miss out
>> the
>> >> > "pathology":
>> >> > "Metastasis".
>> >> >
>> >> > I need to retrieve both documents in the same group even if only one
>> >> meets
>> >> > the search criteria.
>> >> >
>> >> > Thanks!
>> >> >
>> >> > On Sun, Feb 5, 2017 at 6:54 PM, Erick Erickson <
>> erickerick...@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Isn't this just "=groupId:223"?
>> >> > >
>> >> > > Or do you mean you need multiple _groups_? In which case you can
>> use
>> >> > > grouping, see:
>> >> > > https://cwiki.apache.org/confluence/display/solr/
>> >> > > Collapse+and+Expand+Results
>> >> > > and/or
>> >> > > https://cwiki.apache.org/confluence/display/solr/Result+Grouping
>> >> > >
>> >> > > but do note there are some limitations in distributed mode.
>> >> > >
>> >> > > Best,
>> >> > > Erick
>> >> > >
>> >> > > On Sun, Feb 5, 2017 at 1:49 AM, Cristian Popovici
>> >> > >  wrote:
>> >> > > > Hi all,
>> >> > > >
>> >> > > > I'm new to Solr and I need a bit of help.
>> >> > > >
>> >> > > > I have a structure of documents indexed in Solr that are grouped
>> >> > together
>> >> > > > by a property. I need to retrieve all groups where at least one
>> entry
>> >> > in
>> >> > > > the group matches a query.
>> >> > > >
>> >> > > > Example:
>> >> > > > I have two documents indexed and both share the *groupId
>> *property
>> >> that
>> >> > > > defines the grouping field.
>> >> > > >
>> >> > > > *{*
>> >> > > > *"groupId": "223",*
>> >> > > > *"modality": "Computed Tomography",*
>> >> > > > *"anatomy": "Subcutaneous fat",*
>> >> > > > *"pathology": "Metastasis",*
>> >> > > > *}*
>> >> > > >
>> >> > > > *{*
>> >> > > > *"groupId": "223",*
>> >> > > > *"modality": "Computed Tomography",*
>> >> > > > *"anatomy": "Subcutaneous fat",*
>> >> > > > *"pathology": "Normal",*
>> >> > > > *}*
>> >> > > >
>> >> > > > I need to retrieve both 

Re: 回复: bin/post and self-signed SSL

2017-02-06 Thread Kevin Risden
I expect that the commands work the same or very close from 5.5.x through
6.4.x. There have been some cleaning up of the bin/solr and bin/post
commands but not many security changes. If you find differently then please
let us know.

Kevin Risden

On Feb 5, 2017 21:02, "alias" <524839...@qq.com> wrote:

> You mean this can only be used in this version 5.5.x? Other versions
> invalid?
>
>
>
>
> -- 原始邮件 --
> 发件人: "Kevin Risden";;
> 发送时间: 2017年2月6日(星期一) 上午9:44
> 收件人: "solr-user";
>
> 主题: Re: bin/post and self-signed SSL
>
>
>
> Originally formatted as MarkDown. This was tested against Solr 5.5.x
> packaged as Lucidworks HDP Search. It would be the same as Solr 5.5.x.
>
> # Using Solr
> *
> https://cwiki.apache.org/confluence/display/solr/Solr+
> Start+Script+Reference
> * https://cwiki.apache.org/confluence/display/solr/Running+Solr
> * https://cwiki.apache.org/confluence/display/solr/Collections+API
>
> ## Create collection (w/o Kerberos)
> ```bash
> /opt/lucidworks-hdpsearch/solr/bin/solr create -c test
> ```
>
> ## Upload configuration directory (w/ SSL and Kerberos)
> ```bash
> /opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh
> -zkhost ZK_CONNECTION_STRING -cmd upconfig -confname basic_config -confdir
> /opt/lucidworks-hdpsearch/solr/server/solr/configsets/basic_configs/conf
> ```
>
> ## Create Collection (w/ SSL and Kerberos)
> ```bash
> curl -k --negotiate -u : "
> https://SOLR_HOST:8983/solr/admin/collections?action=
> CREATE=newCollection=1=
> 1=basic_config
> "
> ```
>
> ## Delete collection (w/o Kerberos)
> ```bash
> /opt/lucidworks-hdpsearch/solr/bin/solr delete -c test
> ```
>
> ## Delete Collection (w/ SSL and Kerberos)
> ```bash
> curl -k --negotiate -u : "
> https://SOLR_HOST:8983/solr/admin/collections?action=
> DELETE=newCollection
> "
> ```
>
> ## Adding some test docs (w/o SSL)
> ```bash
> /opt/lucidworks-hdpsearch/solr/bin/post -c test
> /opt/lucidworks-hdpsearch/solr/example/exampledocs/*.xml
> ```
>
> ## Adding documents (w/ SSL and Kerberos)
> ```bash
> curl -k --negotiate -u : "
> https://SOLR_HOST:8983/solr/newCollection/update?commit=true; -H
> "Content-Type: application/json" --data-binary
> @/opt/lucidworks-hdpsearch/solr/example/exampledocs/books.json
> ```
>
> ## List Collections (w/ SSL and Kerberos)
> ```bash
> curl -k --negotiate -u : "
> https://SOLR_HOST:8983/solr/admin/collections?action=LIST;
> ```
>
> Kevin Risden
>
> On Sun, Feb 5, 2017 at 5:55 PM, Kevin Risden 
> wrote:
>
> > Last time I looked at this, there was no way to pass any Java properties
> > to the bin/post command. This made it impossible to even set the SSL
> > properties manually. I checked master just now and still there is no
> place
> > to enter Java properties that would make it to the Java command.
> >
> > I came up with a chart of commands previously that worked with standard
> > (no SSL or Kerberos), SSL only, and SSL with Kerberos. Only the standard
> > solr setup worked for the bin/solr and bin/post commands. Errors popped
> up
> > that I couldn't work around. I've been meaning to get back to it just
> > haven't had a chance.
> >
> > I'll try to share that info when I get back to my laptop.
> >
> > Kevin Risden
> >
> > On Feb 5, 2017 12:31, "Jan Høydahl"  wrote:
> >
> >> Hi,
> >>
> >> I’m trying to post a document to Solr using bin/post after enabling SSL
> >> with self signed certificate. Result is:
> >>
> >> $ post -url https://localhost:8983/solr/sslColl *.html
> >> /usr/lib/jvm/java-8-openjdk-amd64/bin/java -classpath
> >> /opt/solr/dist/solr-core-6.4.0.jar -Dauto=yes -Durl=
> >> https://localhost:8983/solr/sslColl -Dc= -Ddata=files
> >> org.apache.solr.util.SimplePostTool lab-index.html lab-ops1.html
> >> lab-ops2.html lab-ops3.html lab-ops4.html lab-ops6.html lab-ops8.html
> >> SimplePostTool version 5.0.0
> >> Posting files to [base] url https://localhost:8983/solr/sslColl...
> >> Entering auto mode. File endings considered are
> >> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,
> >> ods,ott,otp,ots,rtf,htm,html,txt,log
> >> POSTing file lab-index.html (text/html) to [base]/extract
> >> SimplePostTool: FATAL: Connection error (is Solr running at
> >> https://localhost:8983/solr/sslColl ?): javax.net.ssl.
> SSLHandshakeException:
> >> sun.security.validator.ValidatorException: PKIX path building failed:
> >> sun.security.provider.certpath.SunCertPathBuilderException: unable to
> >> find valid certification path to requested target
> >>
> >>
> >> Do anyone know a workaround for letting bin/post accept self-signed
> cert?
> >> Have not tested it against a CA signed Solr...
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >>


Re: Faceting and Grouping Performance Degradation in Solr 5

2017-02-06 Thread Solr User
I am pleased to report that we are in Production on Solr 5.5.3 with
comparable performance to Solr 4.8.1 through leveraging facet.method=uif as
well as https://issues.apache.org/jira/browse/SOLR-9176.  Thanks to
everyone who worked on these!

On Mon, Oct 3, 2016 at 3:55 PM, Solr User  wrote:

> Below is some further testing.  This was done in an environment that had
> no other queries or updates during testing.  We ran through several
> scenarios so I pasted this with HTML formatting below so you may view this
> as a table.  Sorry if you have to pull this out into a different file for
> viewing, but I did not want the formatting to be messed up.  The times are
> average times in milliseconds.  Same test methodology as above except there
> was a 5 minute warmup and a 15 minute test.
>
> Note that both the segment and deletions were recorded from only 1 out of
> 2 of the shards so we cannot try to extrapolate a function between them and
> the outcome.  In other words, just view them as "non-optimized" versus
> "optimized" and "has deletions" versus "no deletions".  The only exceptions
> are the 0 deletes were true for both shards and the 1 segment and 8 segment
> cases were true for both shards.  A few of the tests were repeated as well.
>
> The only conclusion that I could draw is that the number of segments and
> the number of deletes appear to greatly influence the response times, at
> least more than any difference in Solr version.  There also appears to be
> some external contributor to variancemaybe network, etc.
>
> Thoughts?
>
>
> Date9/29/20169/29/
> 20169/29/20169/30/20169/30/
> 20169/30/20169/30/20169/30/
> 20169/30/20169/30/20169/30/
> 20169/30/20169/30/201610/3/
> 201610/3/201610/3/201610/3/2016Solr
> Version5.5.25.5.24.8.14.
> 8.14.8.15.5.25.5.25.5.2<
> /td>5.5.25.5.25.5.25.5.2 td>5.5.24.8.14.8.14.8.1 td>4.8.1Deleted Docs57873
> 57873176958593694593694
> 578735787357873578730<
> /td>00<
> /td>0Segment Count3434 td>1827273434<
> td>34348811 td>8811
> facet.method=uifYESYESN/A<
> td>N/AN/AYESYESNO td>NONOYESYESNO td>N/AN/AN/AN/AScenario
> #1198210145186<
> td>190208209210206 td>1091427370160 td>1098385Scenario
> #29288596258 td>7270777468<
> td>7363616654
> 5251
>
>
>
>
> On Wed, Sep 28, 2016 at 4:44 PM, Solr User  wrote:
>
>> I plan to re-test this in a separate environment that I have more control
>> over and will share the results when I can.
>>
>> On Wed, Sep 28, 2016 at 3:37 PM, Solr User  wrote:
>>
>>> Certainly.  And I would of course welcome anyone else to test this for
>>> themselves especially with facet.method=uif to see if that has indeed
>>> bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
>>> testing is invalid due to variance, problem in process, etc.  One thing I
>>> was pondering is if I should force merge the index to a certain amount of
>>> segments because indexing yields a random number of segments and
>>> deletions.  The only thing stopping me short of doing that were
>>> observations of longer Solr 4 times even with more deletions and similar
>>> number of segments.
>>>
>>> We use Soasta as our testing tool.  Before testing, load is sent for
>>> 10-15 minutes to make sure any Solr caches have stabilized.  Then the test
>>> is run for 30 minutes of steady volume with Scenario #1 tested at 15
>>> req/sec and Scenario #2 tested at 100 req/sec.  Each request is different
>>> with input being pulled from data files.  The requests are repeatable test
>>> to test.
>>>
>>> The numbers posted above are average response times as reported by
>>> Soasta.  However, respective time differences are supported by Splunk which
>>> indexes the Solr logs and Dynatrace which is instrumented on one of the
>>> JVM's.
>>>
>>> The versions are deployed to the same machines thereby overlaying the
>>> previous installation.  Going Solr 4 to Solr 5, full indexing is run with
>>> the same input data.  Being in SolrCloud mode, the full indexing comprises
>>> of indexing all documents and then deleting any that were not touched.
>>> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
>>> load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
>>> results as the previous Solr 4 test.
>>>
>>>
>>> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen 
>>> wrote:
>>>
 On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
 > Further testing indicates that any performance difference is not due
 > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
 > deletes.

 Sanity check: Could you describe how you test?

 * How many queries do you issue for each test?
 * Are each query a new one or do you re-use the same query?
 * Do you discard the first X calls?
 * Are the numbers averages, medians or something third?
 * What do you do about disk cache?
 * Are both Solr's on the same machine?
 * Do they 

Re: Solr querying nested documents with ChildDocTransformerFactory, get “Parent query yields document which is not matched by parents filter”

2017-02-06 Thread David Kramer
For closure, I’ve solved the problem!  It was not using my schema.xml at all.  
I had to change the solrconfig.xml to include  and comment out the schema adding processor.

My schema still didn’t work right, but I took the managed-schema and renamed it 
and changed uniqueKey to uuid and everything worked!

Thanks for your time and help.


On 2/2/17, 4:35 PM, "David Kramer"  wrote:

Yes, think of the starving orphan records…

Ours is an eCommerce system, selling mostly shoes.  We have three levels of 
nested objects representing what we sell:
- Product: Mostly title and description
- Item: A specific color and some other attributes, including price. 
Products have 1 or more Items, Items belong to one product.
- SKU: A specific size and SKU ID. Items have 1 or more SKUs, SKUs belong 
to one Item.
[PRODUCT  [ITEM  [SKU] [SKU] [SKU]] [ITEM [SKU]] ]

Products, items, and SKUs all have ID numbers. One product will never have 
the same ID as another product, but it’s possible for a product to have the 
same ID as an Item or a SKU. And that is the problem.  So the program that 
creates the import file adds a new field called uuid, that is a P, I, or S (for 
Product, Item, or SKU) followed by the ID.  We did it this way because my 
understanding is Solr can’t implement a compound unique key.  The uuid is 
unique across all documents, not just all documents of the same docType.

So in the case of my unique test to see if it would complain if the UUID of 
a document I was inserting was not unique, I grabbed the first few products 
from the full import file, and changed the IDs so they are not duplicates of 
the real data, but left the UUIDs alone, so they are duplicates of the real 
data, which was already loaded.  

My expectation was that when I loaded the data I would get some  error 
saying that UUID was already used.  YOUR expectation is that the record would 
be overwritten.  What actually happened is that the new documents got added 
with their duplicate UUIDs, which is the worst possible case.  This is why I 
think it’s not respecting my uniqueKey setting in schema.xml.

Does that make more sense?  I hope you can help me understand this 
discrepancy. Thanks for your efforts so far.

On 2/2/17, 3:13 PM, "Mikhail Khludnev"  wrote:

David,
I hardly get the way which IDs are assigned, but beware that repeating
uniqueKey
value causes deleting former occurrence. In case of block join index it
corrupts block structure: parent can't be deleted and left children 
orphans
(.. so touching, I'm sorry). Just make sure that number of deleted docs 
is
0 at first.

On Thu, Feb 2, 2017 at 6:20 PM, David Kramer 
wrote:

> Thanks, for responding. Mikhail.  There are no deleted documents.  
Since
> I’m fairly new to Solr, one of the things I’ve been paranoid about is 
I
> have no way of validating my schema.xml, or know whether Solr is even 
using
> it (I have evidence it’s not, more below). So for each test, I’ve 
wiped out
> the index, recreated, and reimported.
>
> Back to whether my schema.xml is being used, I mentioned that I had to
> come up with a compound UUID field of the first character of the 
docType
> plus the ID, and we put “uuid” (was id) in our
> schema.xml.  Then I deleted and recreated the index and restarted 
Solr.  In
> order to verify it was working, I created an import file that had 
unique
> IDs but UUIDs which were duplicates of existing records, and it 
imported
> the new records even though the UUIDs existed in the database 
already.  I’m
> not sure if Solr should have produced an error or not. I’ll research 
that,
> but I mention that here in case it’s relevant.
>
> Thanks.
>
> On 2/2/17, 6:10 AM, "Mikhail Khludnev"  wrote:
>
> David,
>
> Can you make sure your index doesn't have deleted docs? This  can 
be
> seen
> in SolrAdmiun.
> And can you merge index to avoid having them in the index?
>
> On Thu, Feb 2, 2017 at 12:29 AM, David Kramer <
> david.kra...@shoebuy.com>
> wrote:
>
> >
> >
> > Some background:
> > · The data involved is catalog data, with three nested
> objects:
> > Products, Items, and Skus, in that order. We have a docType 
field on
> each
> > record as a differentiator.
> > · The "id" field in our data is unique within datatype, 
but
> not
> > across datatypes. We added a "uuid" field in our program that
> generates the
> 

Re: Issues with uniqueKey != id?

2017-02-06 Thread David Kramer
I’m just setting that up now. I’m far from a Solr expert so I won’t swear we’re 
doing it right though

Our issue is that we have documents, nested 3 deep.  Products, Items, and SKUs. 
 Each has an ID field that’s unique within the document type, but unfortunately 
we have products with the same ID as Items, etc.  So we created a new field 
UUID that’s a concatenation of the document type (first letter, actually) and 
the ID, which is unique.  

The program that creates the import file builds that field, as it’s my 
understanding you can’t use copyfield for the unique key field for some reason 
related to SolrCloud (sorry I don’t have the URL for where I saw that).  I 
would love to be able to copyfield them together though and have the import 
file be smaller.

On 2/3/17, 11:49 AM, "Matthias X Falkenberg"  wrote:

Howdy,

In the Solr Wiki I stumbled upon a somewhat vague statement on the 
uniqueKey:

>  https://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field
>  It shouldn't matter whether you rename this to something else (and 
change the  value), but occasionally it has in the past. We 
recommend that you just leave this definition alone. 

I'd be very grateful for any positive or negative experiences with 
"uniqueKey" not being set to "id" - especially if your experiences are 
related to Solr 6.2.1+.

Many thanks,

Matthias Falkenberg

IBM Deutschland Research & Development GmbH / Vorsitzende des 
Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, 
HRB 243294





DataImportHandler - Unable to load Tika Config Processing Document # 1

2017-02-06 Thread Anatharaman, Srinatha (Contractor)
Hi,

I am having below error while trying to index using dataImporthandler

Data-Config file is mentioned below. zookeeper is not able to read 
"tikaConfig.xml" on below statement

  processor="TikaEntityProcessor" tikaConfig="tikaConfig.xml"

Please help me to resolve this issue

ion: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load 
Tika Config Processing Document # 1
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load 
Tika Config Processing Document # 1
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
Unable to load Tika Config Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(TikaEntityProcessor.java:96)
at 
org.apache.solr.handler.dataimport.EntityProcessorBase.init(EntityProcessorBase.java:60)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.init(TikaEntityProcessor.java:76)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:75)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:433)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
... 5 more
Caused by: org.apache.solr.common.cloud.ZooKeeperException: 
ZkSolrResourceLoader does not support getConfigDir() - likely, what you are 
trying to do is not supported in ZooKeeper mode
at 
org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader.java:149)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(TikaEntityProcessor.java:91)
... 11 more


























Re: Time of insert

2017-02-06 Thread Alexandre Rafalovitch
If you are reindexing full documents, there is no way.

If you are actually doing updates using Solr updates XML/JSON, then
you can have a created_date field with default value of NOW.
Similarly, you could probably do something with UpdateRequestProcessor
chains to get that NOW added somewhere.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 6 February 2017 at 15:32, Mahmoud Almokadem  wrote:
> Hello,
>
> I'm using dih on solr 6 for indexing data from sql server. The document can
> br indexed many times according to the updates on it. Is that available to
> get the first time the document inserted to solr?
>
> And how to get the dates of the document updated?
>
> Thanks for help,
> Mahmoud


Re: Help with design choice: join or multiValued field

2017-02-06 Thread Fuad Efendi
Correct: multivalued field with 1 shop IDs. Use case: shopping network
in U.S. for example for a big brand such as Walmart, when user implicitly
provides IP address or explicitly Postal Code, so that we can find items in
his/her neighbourhood.


You basically provide “join” information via this 10,000-sized collection
of IDs per document. It almost doesn’t have any impact on index size. User
query needs to provide list of preferred IDs (if for example we know user’s
geo location). And for this “Walmart” use case you may also need “Available
Online Only” option, etc.


From: Karl Kildén  
Reply: solr-user@lucene.apache.org 

Date: February 6, 2017 at 5:57:41 AM
To: solr-user@lucene.apache.org 

Subject:  Help with design choice: join or multiValued field

Hello!

I have Items and I have Shops. This is a e-commerce system with items from
thousands of shops all though the inventory is often similar between shops.
Some users can shop from any shop and some only from their default one.


One item can exist in about 1 shops.


- When a user logs in they may have a shop pre selected so when they
search for items we need to get all matching documents but if it's' found
in their pre selected shop we should mark it out in the UI.
- They need to be able to filter out only items in their current shop
- Items found in their shop should always be boosted heavily



TLDR:

Either we just have a multiValued field on the item document with all
shops. This would be a multivalued field with 1 rows

Or

Could we have a new document ShopItem that has the shopId and the itemId
(think join table). Then we join this document instead... But we still need
to get the Item document back, and we need bq boosting on item?


Re: custom plugin version

2017-02-06 Thread Zaccheo Bagnati
What a newbie I am! :)
OK, I've seen methods to override: I'll give a try. I suppose that getName
output is then shown somewhere in the solr response.
Thank you again Erick for your patience.
Kind regards
Zaccheo



Il giorno lun 6 feb 2017 alle ore 17:33 Erick Erickson <
erickerick...@gmail.com> ha scritto:

> Sorry, IIRC is an acronym for "If I Recall Correctly", it's not a method
> name ;)
>
> There should be a method in the superclass (DocTransformer) that you
> can use to return information about the plugin, maybe getName or
> toString depending on your needs.
>
> Best,
> Erick
>
> On Mon, Feb 6, 2017 at 4:04 AM, Zaccheo Bagnati 
> wrote:
> > Thank you all for your answers.
> > Directory and  directive suggestions are clear.
> > Can you expand a little bit about IIRC method? I'm not so used to solr
> code
> > (and btw I'm neither an experienced java programmer).
> >
> > Il giorno ven 3 feb 2017 alle ore 19:25 Erick Erickson <
> > erickerick...@gmail.com> ha scritto:
> >
> >> The plugin itself is responsible for returning information about
> >> itself via an overridden method IIRC, so you have control over what
> >> version is reported.
> >>
> >> As for the other, a slight variant on King's process would be to put
> >> your custom jars in a different directory then used the 
> >> directive in solrconfig to explicitly load a specific jar rather than
> >> the regex. But separate directories would work as well, a matter of
> >> taste really.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Feb 3, 2017 at 8:21 AM, King Rhoton  wrote:
> >> > What we ended up doing was creating separate directories for each
> >> version of a plugin we had written, and in each collection's
> >> solrconfig,xml, we add the path to the specific directory we wanted that
> >> collection to use via the " >> >
> >> >> On Feb 3, 2017, at 2:40 AM, Andrea Gazzarini 
> wrote:
> >> >>
> >> >> Hi Zaccheo,
> >> >> I don't think this is possible, this is something related with the
> >> classloader behavior, and even if there's a "priority" rule in the JVM,
> I
> >> wouldn't rely on that in my application.
> >> >> That could be good in a dev environment where you can specify the
> >> "order" of the imported libraries (e.g. Eclipse), but definitely not so
> >> good outside (IMO).
> >> >>
> >> >> As far as I know, there's no a built-in way to declare the version of
> >> custom components, but you could adopt the same approach of Lucene, with
> >> something like a Version class that drives the behavior of your
> component.
> >> >> In this way you will have
> >> >>
> >> >> * always one jar (better: unique classes FQNs), so no classloader
> issues
> >> >> * a behavior that changes depending on the configuration
> >> >>
> >> >> Best,
> >> >> Andrea
> >> >>
> >> >> On 03/02/17 10:57, Zaccheo Bagnati wrote:
> >> >>> Hi all,
> >> >>> I developed a custom DocTransformer that is loaded from a .jar in
> the
> >> core
> >> >>> "lib" directory. It works but I have now a problem with versioning:
> >> >>> 1. if lib directory contains different versions of the same .jar
> which
> >> one
> >> >>> is loaded? I tried putting both myplugins-1.0.0.jar and
> >> myplugins-1.0.1.jar
> >> >>> and I noticed that the oldest one is loaded. Is there a way to force
> >> >>> specific jar version to be loaded in solrconfig?
> >> >>> 2. More in general: is it possible to expose in solr the version
> >> number for
> >> >>> custom plugins?
> >> >>> Thank you in advance
> >> >>>
> >> >>
> >> >
> >> >
> >> > -
> >> > King Rhoton, c/o Adobe, 601 Townsend, SF, CA 94103
> >> > 415-832-4480 x24480 <(415)%20832-4480> <(415)%20832-4480>
> >> > S support requests should go to search-...@adobe.com
> >> >
> >>
>


Re: Time of insert

2017-02-06 Thread Fuad Efendi
Not; historical logs for document updates is not provided. Users need to
implement such functionality themselves if needed.


From: Mahmoud Almokadem  
Reply: solr-user@lucene.apache.org 

Date: February 6, 2017 at 3:32:34 PM
To: solr-user@lucene.apache.org 

Subject:  Time of insert

Hello,

I'm using dih on solr 6 for indexing data from sql server. The document can
br indexed many times according to the updates on it. Is that available to
get the first time the document inserted to solr?

And how to get the dates of the document updated?

Thanks for help,
Mahmoud


Time of insert

2017-02-06 Thread Mahmoud Almokadem
Hello,

I'm using dih on solr 6 for indexing data from sql server. The document can
br indexed many times according to the updates on it. Is that available to
get the first time the document inserted to solr?

And how to get the dates of the document updated?

Thanks for help,
Mahmoud


RE: Switching from Managed Schema to Manually Edited schema.xml --IS NOT WORKING

2017-02-06 Thread Anatharaman, Srinatha (Contractor)
Erick,

I did as mentioned in that URL, made changes to solrconfig and kept only 
required fields in schema.xml
Would you mind sharing config files for indexing text document?

Regards,
~Sri

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Monday, February 06, 2017 12:22 AM
To: solr-user 
Subject: Re: Switching from Managed Schema to Manually Edited schema.xml --IS 
NOT WORKING

This is still using the managed schema specifically the data_driven_configs 
schema as evidenced by the add-unknown-field-to-the-schema part of the URL.

It looks like you're not _really_ removing the managed schema definitions from 
your solrconfig.xml. You must
1> change solrconfig.xml
2> push it to ZooKeeper
3> reload the collection

before the config changes actually take effect.

Best,
Erick

On Sun, Feb 5, 2017 at 9:05 PM, Anatharaman, Srinatha (Contractor) 
 wrote:
> Hi ,
>
> I am indexing a Text document and followed the steps defined in below 
> URL to create the schema.xml 
> https://cwiki.apache.org/confluence/display/solr/Schema+Factory+Defini
> tion+in+SolrConfig#SchemaFactoryDefinitioninSolrConfig-SwitchingfromMa
> nagedSchematoManuallyEditedschema.xml
>
> After making above changes, When I try to index the document using curl 
> command I get below error :
>
>name="responseHeader">400 name="QTime">147 name="metadata"> name="error-class">org.apache.solr.common.SolrException name="root-error-class">org.apache.solr.common.SolrException name="error-class">org.apache.solr.update.processor.DistributedUpdateP
> rocessor$DistributedUpdatesAsyncException name="root-error-class">org.apache.solr.update.processor.DistributedUp
> dateProcessor$DistributedUpdatesAsyncException name="msg">Async exception during distributed update: Bad Request
>
>
>
> request: 
> http://165.137.46.219:8983/solr/gsearch_shard1_replica2/update?update.
> chain=add-unknown-fields-to-the-schemaupdate.distrib=TOLEADER
> ;distrib.from=http%3A%2F%2F165.137.46.218%3A8983%2Fsolr%2Fgsearch_shar
> d2_replica1%2Fwt=javabinversion=2 name="code">400 
>
> Could someone help me to resolve this issue, How do I create a 
> schema.xml file for a text document(document content varies for each 
> files). I want to index entire document as whole and search on the 
> document content
>
> Thanks & Regards,
> ~Sri
>
>



RE: Solr 6.4. Can't index MS Visio vsdx files

2017-02-06 Thread Allison, Timothy B.
Shouldn't have taken you that much effort.  Sorry.

Y, I should probably get around to a patch for: 
https://issues.apache.org/jira/browse/SOLR-9552

Although, frankly, it might be time for Tika 1.15 shortly.

-Original Message-
From: Gytis Mikuciunas [mailto:gyt...@gmail.com] 
Sent: Monday, February 6, 2017 11:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 6.4. Can't index MS Visio vsdx files

Tim, you saved my day ;)

now vsdx files were indexed successfully.

Thank you very much!!!

summary: as a workaround I have in solr-6.4.0\contrib\extraction\lib:

1. ooxml-schemas-1.3.jar instead of poi-ooxml-schemas-3.15.jar 2. 
curvesapi-1.03.jar


So, now I'm waiting when this will be implemented in a official version of 
solr/tika.

Regards,
Gytis

On Mon, Feb 6, 2017 at 4:16 PM, Allison, Timothy B. 
wrote:

> Argh.  Looks like we need to add curvesapi (BSD 3-clause) to Solr.
>
> For now, add this jar:
> https://mvnrepository.com/artifact/com.github.virtuald/curvesapi/1.03
>
> See also [1]
>
> [1] http://apache-poi.1045710.n5.nabble.com/support-for-
> reading-Microsoft-Visio-2013-vsdx-format-td5721500.html
>
> -Original Message-
> From: Gytis Mikuciunas [mailto:gyt...@gmail.com]
> Sent: Monday, February 6, 2017 8:19 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 6.4. Can't index MS Visio vsdx files
>
> sad, but didn't help.
>
> what I did:
>
> 1. stopped solr: bin\solr stop -p 80
> 2. removed poi-ooxml-schemas-3.15.jar from contrib\extraction\lib 3. 
> add ooxml-schemas-1.3.jar to contrib\extraction\lib 4. restarted solr: 
> bin\solr start -p 80 -m 4g 5. tried again to parse vsdx file:
>
> java -Dauto -Dc=db_new02 -Dport=80 -Dfiletypes=vsd,vsdx 
> -Drecursive=yes -jar example/exampledocs/post.jar "I:\Tools"
>
> SimplePostTool version 5.0.0
> Posting files to [base] url http://localhost:80/solr/db_new02/update...
> Entering auto mode. File endings considered are vsd,vsdx Entering 
> recursive mode, max depth=999, delay=0s Indexing directory I:\Tools (1 
> files, depth=0) POSTing file span ports.vsdx 
> (application/octet-stream) to [base]/extract
> SimplePostTool: WARNING: Solr returned an error #500 (Server Error) 
> for
> url:
> http://localhost:80/solr/db_new02/update/extract?resource.
> name=I%3A%5CTools%5Cspan+ports.vsdx
> SimplePostTool: WARNING: Response:http-equiv="Content-Type" content="text/html;charset=utf-8"/>
> Error 500 Server Error
> 
> HTTP ERROR 500
> Problem accessing /solr/db_new02/update/extract. Reason:
> Server ErrorCaused
> by:java.lang.NoClassDefFoundError: com/graphbuilder/curve/Point
> at java.lang.Class.getDeclaredConstructors0(Native Method)
> at java.lang.Class.privateGetDeclaredConstructors(Unknown Source)
> at java.lang.Class.getConstructor0(Unknown Source)
> at java.lang.Class.getDeclaredConstructor(Unknown Source)
> at org.apache.poi.xdgf.util.ObjectFactory.put(
> ObjectFactory.java:34)
> at
> org.apache.poi.xdgf.usermodel.section.geometry.
> GeometryRowFactory.clinit(GeometryRowFactory.java:39)
> at
> org.apache.poi.xdgf.usermodel.section.GeometrySection.
> init(GeometrySection.java:55)
> at
> org.apache.poi.xdgf.usermodel.XDGFSheet.init(XDGFSheet.java:77)
> at
> org.apache.poi.xdgf.usermodel.XDGFShape.init(XDGFShape.java:113)
> at
> org.apache.poi.xdgf.usermodel.XDGFShape.init(XDGFShape.java:107)
> at
> org.apache.poi.xdgf.usermodel.XDGFBaseContents.onDocumentRead(
> XDGFBaseContents.java:82)
> at
> org.apache.poi.xdgf.usermodel.XDGFMasterContents.onDocumentRead(
> XDGFMasterContents.java:66)
> at
> org.apache.poi.xdgf.usermodel.XDGFMasters.onDocumentRead(
> XDGFMasters.java:101)
> at
> org.apache.poi.xdgf.usermodel.XmlVisioDocument.onDocumentRead(
> XmlVisioDocument.java:106)
> at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:190)
> at
> org.apache.poi.xdgf.usermodel.XmlVisioDocument.init(
> XmlVisioDocument.java:79)
> at
> org.apache.poi.xdgf.extractor.XDGFVisioExtractor.init&
> gt;(XDGFVisioExtractor.java:41)
> at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(
> ExtractorFactory.java:207)
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(
> OOXMLExtractorFactory.java:86)
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.
> parse(OOXMLParser.java:87)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(
> ExtractingDocumentLoader.java:228)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
> ContentStreamHandlerBase.java:68)
> at
> 

Re: custom plugin version

2017-02-06 Thread Erick Erickson
Sorry, IIRC is an acronym for "If I Recall Correctly", it's not a method name ;)

There should be a method in the superclass (DocTransformer) that you
can use to return information about the plugin, maybe getName or
toString depending on your needs.

Best,
Erick

On Mon, Feb 6, 2017 at 4:04 AM, Zaccheo Bagnati  wrote:
> Thank you all for your answers.
> Directory and  directive suggestions are clear.
> Can you expand a little bit about IIRC method? I'm not so used to solr code
> (and btw I'm neither an experienced java programmer).
>
> Il giorno ven 3 feb 2017 alle ore 19:25 Erick Erickson <
> erickerick...@gmail.com> ha scritto:
>
>> The plugin itself is responsible for returning information about
>> itself via an overridden method IIRC, so you have control over what
>> version is reported.
>>
>> As for the other, a slight variant on King's process would be to put
>> your custom jars in a different directory then used the 
>> directive in solrconfig to explicitly load a specific jar rather than
>> the regex. But separate directories would work as well, a matter of
>> taste really.
>>
>> Best,
>> Erick
>>
>> On Fri, Feb 3, 2017 at 8:21 AM, King Rhoton  wrote:
>> > What we ended up doing was creating separate directories for each
>> version of a plugin we had written, and in each collection's
>> solrconfig,xml, we add the path to the specific directory we wanted that
>> collection to use via the "> >
>> >> On Feb 3, 2017, at 2:40 AM, Andrea Gazzarini  wrote:
>> >>
>> >> Hi Zaccheo,
>> >> I don't think this is possible, this is something related with the
>> classloader behavior, and even if there's a "priority" rule in the JVM, I
>> wouldn't rely on that in my application.
>> >> That could be good in a dev environment where you can specify the
>> "order" of the imported libraries (e.g. Eclipse), but definitely not so
>> good outside (IMO).
>> >>
>> >> As far as I know, there's no a built-in way to declare the version of
>> custom components, but you could adopt the same approach of Lucene, with
>> something like a Version class that drives the behavior of your component.
>> >> In this way you will have
>> >>
>> >> * always one jar (better: unique classes FQNs), so no classloader issues
>> >> * a behavior that changes depending on the configuration
>> >>
>> >> Best,
>> >> Andrea
>> >>
>> >> On 03/02/17 10:57, Zaccheo Bagnati wrote:
>> >>> Hi all,
>> >>> I developed a custom DocTransformer that is loaded from a .jar in the
>> core
>> >>> "lib" directory. It works but I have now a problem with versioning:
>> >>> 1. if lib directory contains different versions of the same .jar which
>> one
>> >>> is loaded? I tried putting both myplugins-1.0.0.jar and
>> myplugins-1.0.1.jar
>> >>> and I noticed that the oldest one is loaded. Is there a way to force
>> >>> specific jar version to be loaded in solrconfig?
>> >>> 2. More in general: is it possible to expose in solr the version
>> number for
>> >>> custom plugins?
>> >>> Thank you in advance
>> >>>
>> >>
>> >
>> >
>> > -
>> > King Rhoton, c/o Adobe, 601 Townsend, SF, CA 94103
>> > 415-832-4480 x24480 <(415)%20832-4480>
>> > S support requests should go to search-...@adobe.com
>> >
>>


Re: Solr 6.4. Can't index MS Visio vsdx files

2017-02-06 Thread Gytis Mikuciunas
Tim, you saved my day ;)

now vsdx files were indexed successfully.

Thank you very much!!!

summary: as a workaround I have in solr-6.4.0\contrib\extraction\lib:

1. ooxml-schemas-1.3.jar instead of poi-ooxml-schemas-3.15.jar
2. curvesapi-1.03.jar


So, now I'm waiting when this will be implemented in a official version of
solr/tika.

Regards,
Gytis

On Mon, Feb 6, 2017 at 4:16 PM, Allison, Timothy B. 
wrote:

> Argh.  Looks like we need to add curvesapi (BSD 3-clause) to Solr.
>
> For now, add this jar:
> https://mvnrepository.com/artifact/com.github.virtuald/curvesapi/1.03
>
> See also [1]
>
> [1] http://apache-poi.1045710.n5.nabble.com/support-for-
> reading-Microsoft-Visio-2013-vsdx-format-td5721500.html
>
> -Original Message-
> From: Gytis Mikuciunas [mailto:gyt...@gmail.com]
> Sent: Monday, February 6, 2017 8:19 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 6.4. Can't index MS Visio vsdx files
>
> sad, but didn't help.
>
> what I did:
>
> 1. stopped solr: bin\solr stop -p 80
> 2. removed poi-ooxml-schemas-3.15.jar from contrib\extraction\lib 3. add
> ooxml-schemas-1.3.jar to contrib\extraction\lib 4. restarted solr: bin\solr
> start -p 80 -m 4g 5. tried again to parse vsdx file:
>
> java -Dauto -Dc=db_new02 -Dport=80 -Dfiletypes=vsd,vsdx -Drecursive=yes
> -jar example/exampledocs/post.jar "I:\Tools"
>
> SimplePostTool version 5.0.0
> Posting files to [base] url http://localhost:80/solr/db_new02/update...
> Entering auto mode. File endings considered are vsd,vsdx Entering
> recursive mode, max depth=999, delay=0s Indexing directory I:\Tools (1
> files, depth=0) POSTing file span ports.vsdx (application/octet-stream) to
> [base]/extract
> SimplePostTool: WARNING: Solr returned an error #500 (Server Error) for
> url:
> http://localhost:80/solr/db_new02/update/extract?resource.
> name=I%3A%5CTools%5Cspan+ports.vsdx
> SimplePostTool: WARNING: Response:http-equiv="Content-Type" content="text/html;charset=utf-8"/>
> Error 500 Server Error
> 
> HTTP ERROR 500
> Problem accessing /solr/db_new02/update/extract. Reason:
> Server ErrorCaused
> by:java.lang.NoClassDefFoundError: com/graphbuilder/curve/Point
> at java.lang.Class.getDeclaredConstructors0(Native Method)
> at java.lang.Class.privateGetDeclaredConstructors(Unknown Source)
> at java.lang.Class.getConstructor0(Unknown Source)
> at java.lang.Class.getDeclaredConstructor(Unknown Source)
> at org.apache.poi.xdgf.util.ObjectFactory.put(
> ObjectFactory.java:34)
> at
> org.apache.poi.xdgf.usermodel.section.geometry.
> GeometryRowFactory.clinit(GeometryRowFactory.java:39)
> at
> org.apache.poi.xdgf.usermodel.section.GeometrySection.
> init(GeometrySection.java:55)
> at
> org.apache.poi.xdgf.usermodel.XDGFSheet.init(XDGFSheet.java:77)
> at
> org.apache.poi.xdgf.usermodel.XDGFShape.init(XDGFShape.java:113)
> at
> org.apache.poi.xdgf.usermodel.XDGFShape.init(XDGFShape.java:107)
> at
> org.apache.poi.xdgf.usermodel.XDGFBaseContents.onDocumentRead(
> XDGFBaseContents.java:82)
> at
> org.apache.poi.xdgf.usermodel.XDGFMasterContents.onDocumentRead(
> XDGFMasterContents.java:66)
> at
> org.apache.poi.xdgf.usermodel.XDGFMasters.onDocumentRead(
> XDGFMasters.java:101)
> at
> org.apache.poi.xdgf.usermodel.XmlVisioDocument.onDocumentRead(
> XmlVisioDocument.java:106)
> at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:190)
> at
> org.apache.poi.xdgf.usermodel.XmlVisioDocument.init(
> XmlVisioDocument.java:79)
> at
> org.apache.poi.xdgf.extractor.XDGFVisioExtractor.init&
> gt;(XDGFVisioExtractor.java:41)
> at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(
> ExtractorFactory.java:207)
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(
> OOXMLExtractorFactory.java:86)
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.
> parse(OOXMLParser.java:87)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(
> ExtractingDocumentLoader.java:228)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
> ContentStreamHandlerBase.java:68)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:166)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)
> at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
> at org.apache.solr.servlet.HttpSolrCall.call(
> HttpSolrCall.java:464)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:345)
> at
> 

[ANNOUNCE] Apache Solr 6.4.1 released

2017-02-06 Thread Adrien Grand
6 February 2017, Apache Solr™ 6.4.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 6.4.1
Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search and analytics, rich
document parsing, geospatial search, extensive REST APIs as well as
parallel SQL. Solr is enterprise grade, secure and highly scalable,
providing fault tolerant distributed search and indexing, and powers
the search and navigation features of many of the world's largest
internet sites.Solr 6.4.1 is available for immediate download at:
  http://lucene.apache.org/solr/mirrors-solr-latest-redir.htmlPlease
read CHANGES.txt for a full list of new features and changes:
  https://lucene.apache.org/solr/6_4_1/changes/Changes.htmlSolr 6.4.1
contains 5 bug fixes since the 6.4.0 release: * "Plugin/Stats" section
of the UI doesn't display empty metric types * SOLR_SSL_OPTS was
mistakenly overwritten in solr.cmd * Better validation of filename
params in ReplicationHandler * Core swapping did not work with new
metrics changes in place * Admin UI could not find DataImport handlers
due to metrics changes * AnalyzingInfixSuggester/BlendedInfixSuggester
now work with core reloadFurther details of changes are available in
the change log available at:
http://lucene.apache.org/solr/6_4_1/changes/Changes.htmlPlease report
any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)Note: The Apache
Software Foundation uses an extensive mirroring network for
distributing releases. It is possible that the mirror you are using
may not have replicated the release yet. If that is the case, please
try another mirror. This also applies to Maven access.

-- 
Adrien


RE: Solr 6.4. Can't index MS Visio vsdx files

2017-02-06 Thread Allison, Timothy B.
Argh.  Looks like we need to add curvesapi (BSD 3-clause) to Solr.

For now, add this jar:
https://mvnrepository.com/artifact/com.github.virtuald/curvesapi/1.03 

See also [1]

[1] 
http://apache-poi.1045710.n5.nabble.com/support-for-reading-Microsoft-Visio-2013-vsdx-format-td5721500.html

-Original Message-
From: Gytis Mikuciunas [mailto:gyt...@gmail.com] 
Sent: Monday, February 6, 2017 8:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 6.4. Can't index MS Visio vsdx files

sad, but didn't help.

what I did:

1. stopped solr: bin\solr stop -p 80
2. removed poi-ooxml-schemas-3.15.jar from contrib\extraction\lib 3. add 
ooxml-schemas-1.3.jar to contrib\extraction\lib 4. restarted solr: bin\solr 
start -p 80 -m 4g 5. tried again to parse vsdx file:

java -Dauto -Dc=db_new02 -Dport=80 -Dfiletypes=vsd,vsdx -Drecursive=yes -jar 
example/exampledocs/post.jar "I:\Tools"

SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:80/solr/db_new02/update...
Entering auto mode. File endings considered are vsd,vsdx Entering recursive 
mode, max depth=999, delay=0s Indexing directory I:\Tools (1 files, depth=0) 
POSTing file span ports.vsdx (application/octet-stream) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #500 (Server Error) for
url:
http://localhost:80/solr/db_new02/update/extract?resource.name=I%3A%5CTools%5Cspan+ports.vsdx
SimplePostTool: WARNING: Response:   
Error 500 Server Error

HTTP ERROR 500
Problem accessing /solr/db_new02/update/extract. Reason:
Server ErrorCaused
by:java.lang.NoClassDefFoundError: com/graphbuilder/curve/Point
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Unknown Source)
at java.lang.Class.getConstructor0(Unknown Source)
at java.lang.Class.getDeclaredConstructor(Unknown Source)
at org.apache.poi.xdgf.util.ObjectFactory.put(ObjectFactory.java:34)
at
org.apache.poi.xdgf.usermodel.section.geometry.GeometryRowFactory.clinit(GeometryRowFactory.java:39)
at
org.apache.poi.xdgf.usermodel.section.GeometrySection.init(GeometrySection.java:55)
at
org.apache.poi.xdgf.usermodel.XDGFSheet.init(XDGFSheet.java:77)
at
org.apache.poi.xdgf.usermodel.XDGFShape.init(XDGFShape.java:113)
at
org.apache.poi.xdgf.usermodel.XDGFShape.init(XDGFShape.java:107)
at
org.apache.poi.xdgf.usermodel.XDGFBaseContents.onDocumentRead(XDGFBaseContents.java:82)
at
org.apache.poi.xdgf.usermodel.XDGFMasterContents.onDocumentRead(XDGFMasterContents.java:66)
at
org.apache.poi.xdgf.usermodel.XDGFMasters.onDocumentRead(XDGFMasters.java:101)
at
org.apache.poi.xdgf.usermodel.XmlVisioDocument.onDocumentRead(XmlVisioDocument.java:106)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:190)
at
org.apache.poi.xdgf.usermodel.XmlVisioDocument.init(XmlVisioDocument.java:79)
at
org.apache.poi.xdgf.extractor.XDGFVisioExtractor.init(XDGFVisioExtractor.java:41)
at
org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:207)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at

Re: Solr 6.4. Can't index MS Visio vsdx files

2017-02-06 Thread Gytis Mikuciunas
sad, but didn't help.

what I did:

1. stopped solr: bin\solr stop -p 80
2. removed poi-ooxml-schemas-3.15.jar from contrib\extraction\lib
3. add ooxml-schemas-1.3.jar to contrib\extraction\lib
4. restarted solr: bin\solr start -p 80 -m 4g
5. tried again to parse vsdx file:

java -Dauto -Dc=db_new02 -Dport=80 -Dfiletypes=vsd,vsdx -Drecursive=yes
-jar example/exampledocs/post.jar "I:\Tools"

SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:80/solr/db_new02/update...
Entering auto mode. File endings considered are vsd,vsdx
Entering recursive mode, max depth=999, delay=0s
Indexing directory I:\Tools (1 files, depth=0)
POSTing file span ports.vsdx (application/octet-stream) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #500 (Server Error) for
url:
http://localhost:80/solr/db_new02/update/extract?resource.name=I%3A%5CTools%5Cspan+ports.vsdx
SimplePostTool: WARNING: Response: 


Error 500 Server Error

HTTP ERROR 500
Problem accessing /solr/db_new02/update/extract. Reason:
Server ErrorCaused
by:java.lang.NoClassDefFoundError: com/graphbuilder/curve/Point
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Unknown Source)
at java.lang.Class.getConstructor0(Unknown Source)
at java.lang.Class.getDeclaredConstructor(Unknown Source)
at org.apache.poi.xdgf.util.ObjectFactory.put(ObjectFactory.java:34)
at
org.apache.poi.xdgf.usermodel.section.geometry.GeometryRowFactory.clinit(GeometryRowFactory.java:39)
at
org.apache.poi.xdgf.usermodel.section.GeometrySection.init(GeometrySection.java:55)
at
org.apache.poi.xdgf.usermodel.XDGFSheet.init(XDGFSheet.java:77)
at
org.apache.poi.xdgf.usermodel.XDGFShape.init(XDGFShape.java:113)
at
org.apache.poi.xdgf.usermodel.XDGFShape.init(XDGFShape.java:107)
at
org.apache.poi.xdgf.usermodel.XDGFBaseContents.onDocumentRead(XDGFBaseContents.java:82)
at
org.apache.poi.xdgf.usermodel.XDGFMasterContents.onDocumentRead(XDGFMasterContents.java:66)
at
org.apache.poi.xdgf.usermodel.XDGFMasters.onDocumentRead(XDGFMasters.java:101)
at
org.apache.poi.xdgf.usermodel.XmlVisioDocument.onDocumentRead(XmlVisioDocument.java:106)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:190)
at
org.apache.poi.xdgf.usermodel.XmlVisioDocument.init(XmlVisioDocument.java:79)
at
org.apache.poi.xdgf.extractor.XDGFVisioExtractor.init(XDGFVisioExtractor.java:41)
at
org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:207)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at

Re: XMLQueryParser support for Wildcards and Prefix Queries

2017-02-06 Thread Mikhail Khludnev
Hello,

As far as I understand you can hookup lucene syntax with .

On Mon, Feb 6, 2017 at 2:18 PM, Puneet Pawaia 
wrote:

> Hi,
>
> I see that the Lucene XMLQueryParser still does not support some query
> types like Wildcard queries and Prefix queries.
> How is the search for terms with wildcards etc proposed to be handled by
> XmlQueryParser?
>
> Thanks.
> Puneet
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Issues with uniqueKey != id?

2017-02-06 Thread alessandro.benedetti
Hi Matthias,
I found some scenario where could be tricky having different names for the
unique key.

1) Inter collection search - In this case more than callign the uniqueKey
"Id" or not, is the fact of using the same uniqueKey across all the
collections. 
If not when you aggregate the results, Solr will fail to recognize the
unique keys which are different from the aggregator uniqueKey ( and this
will result in losing the results coming from the collection which uses a
different uniqueKey)

2) In the elevation component config you identify the doc to boost
specifying the Id 
e.g.

 
  
...
The "id" label is slightly incorrect, as it should be :


Indeed you don't need to define an "id" field in your document to work with
the elevation component as in the code Solr will fetch the uniqueKey field
from the schema ( schema.getUniqueKeyField()) .
But it may sound a little bit confusing for a new user which decided to use
a different uniqueKey field.

These two are the first considerations that came up to my mind, the second
is more to clarify the syntax of the config file, as you are not going to
have any problem with the uniqueKey field name.
If anything else pop out I will let you know!

Cheers



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issues-with-uniqueKey-id-tp4318662p4318944.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: custom plugin version

2017-02-06 Thread Zaccheo Bagnati
Thank you all for your answers.
Directory and  directive suggestions are clear.
Can you expand a little bit about IIRC method? I'm not so used to solr code
(and btw I'm neither an experienced java programmer).

Il giorno ven 3 feb 2017 alle ore 19:25 Erick Erickson <
erickerick...@gmail.com> ha scritto:

> The plugin itself is responsible for returning information about
> itself via an overridden method IIRC, so you have control over what
> version is reported.
>
> As for the other, a slight variant on King's process would be to put
> your custom jars in a different directory then used the 
> directive in solrconfig to explicitly load a specific jar rather than
> the regex. But separate directories would work as well, a matter of
> taste really.
>
> Best,
> Erick
>
> On Fri, Feb 3, 2017 at 8:21 AM, King Rhoton  wrote:
> > What we ended up doing was creating separate directories for each
> version of a plugin we had written, and in each collection's
> solrconfig,xml, we add the path to the specific directory we wanted that
> collection to use via the " >
> >> On Feb 3, 2017, at 2:40 AM, Andrea Gazzarini  wrote:
> >>
> >> Hi Zaccheo,
> >> I don't think this is possible, this is something related with the
> classloader behavior, and even if there's a "priority" rule in the JVM, I
> wouldn't rely on that in my application.
> >> That could be good in a dev environment where you can specify the
> "order" of the imported libraries (e.g. Eclipse), but definitely not so
> good outside (IMO).
> >>
> >> As far as I know, there's no a built-in way to declare the version of
> custom components, but you could adopt the same approach of Lucene, with
> something like a Version class that drives the behavior of your component.
> >> In this way you will have
> >>
> >> * always one jar (better: unique classes FQNs), so no classloader issues
> >> * a behavior that changes depending on the configuration
> >>
> >> Best,
> >> Andrea
> >>
> >> On 03/02/17 10:57, Zaccheo Bagnati wrote:
> >>> Hi all,
> >>> I developed a custom DocTransformer that is loaded from a .jar in the
> core
> >>> "lib" directory. It works but I have now a problem with versioning:
> >>> 1. if lib directory contains different versions of the same .jar which
> one
> >>> is loaded? I tried putting both myplugins-1.0.0.jar and
> myplugins-1.0.1.jar
> >>> and I noticed that the oldest one is loaded. Is there a way to force
> >>> specific jar version to be loaded in solrconfig?
> >>> 2. More in general: is it possible to expose in solr the version
> number for
> >>> custom plugins?
> >>> Thank you in advance
> >>>
> >>
> >
> >
> > -
> > King Rhoton, c/o Adobe, 601 Townsend, SF, CA 94103
> > 415-832-4480 x24480 <(415)%20832-4480>
> > S support requests should go to search-...@adobe.com
> >
>


RE: Solr 6.4. Can't index MS Visio vsdx files

2017-02-06 Thread Allison, Timothy B.
Ah, ConnectsType.  That's fixed in the most recent version of POI [1], and will 
soon be fixed in Tika [2].  So, no need to open a ticket on Tika's Jira.

> as tika is failing, is it could help or not?

Y, that will absolutely help.  In your Solr contrib/extract/lib directory, 
you'll see poi-ooxml-schemas-3.xx.jar.  Remove that jar and add 
ooxml-schemas.jar [3].  As documented in [4], poi-ooxml-schemas is a subset of 
the much larger (complete) ooxml-schemas; ConnectsType was not in the subset, 
but it _should_ be in ooxml-schemas.

Cheers,

 Tim



[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=60489
[2] https://issues.apache.org/jira/browse/TIKA-2208 
[3] https://mvnrepository.com/artifact/org.apache.poi/ooxml-schemas/1.3 
[4] http://poi.apache.org/faq.html#faq-N10025 


Hi again,

I've tried with tika-app - didn't help

java -jar tika-app-1.14.jar "I:\Dat\span ports.vsdx"
Exception in thread "main" java.lang.NoClassDefFoundError:
com/microsoft/schemas/office/visio/x2012/main/ConnectsType
at com.microsoft.schemas.office.visio.x2012.main.impl.
PageContentsTypeImpl.getConnects(Unknown Source)
at org.apache.poi.xdgf.usermodel.XDGFBaseContents.onDocumentRead(
XDGFBaseContents.java:89)
at org.apache.poi.xdgf.usermodel.XDGFPageContents.onDocumentRead(
XDGFPageContents.java:73)
at org.apache.poi.xdgf.usermodel.XDGFPages.onDocumentRead(
XDGFPages.java:94)
at org.apache.poi.xdgf.usermodel.XmlVisioDocument.onDocumentRead(
XmlVisioDocument.java:108)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:190)
at org.apache.poi.xdgf.usermodel.XmlVisioDocument.(
XmlVisioDocument.java:79)
at org.apache.poi.xdgf.extractor.XDGFVisioExtractor.(
XDGFVisioExtractor.java:41)
at org.apache.poi.extractor.ExtractorFactory.createExtractor(
ExtractorFactory.java:207)
at org.apache.tika.parser.microsoft.ooxml.
OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.
parse(OOXMLParser.java:87)
at org.apache.tika.parser.CompositeParser.parse(
CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(
CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(
AutoDetectParser.java:120)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:191)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:480)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)
Caused by: java.lang.ClassNotFoundException: com.microsoft.schemas.office.
visio.x2012.main.ConnectsType
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 17 more


So next step is to open bug ticket on tika's jira.


And what about with your proposed workaround?
"If this is a missing bean issue (sorry, I can't tell from your stacktrace 
which class is missing), as a temporary workaround, you can rm 
"poi-ooxml-schemas" and add the full "ooxml-schemas", and you should be good to 
go. [3]"

as tika is failing, is it could help or not?

Gytis


On Fri, Feb 3, 2017 at 10:31 PM, Allison, Timothy B. 
wrote:

> This is a Tika/POI problem.  Please download tika-app 1.14 [1] or a 
> nightly version of Tika [2] and run
>
> java -jar tika-app.jar 
>
> If the problem is fixed, we'll try to upgrade dependencies in Solr.  
> If it isn't fixed, please open a bug on Tika's Jira.
>
> If this is a missing bean issue (sorry, I can't tell from your 
> stacktrace which class is missing), as a temporary workaround, you can 
> rm "poi-ooxml-schemas" and add the full "ooxml-schemas", and you 
> should be good to go. [3]
>
> Cheers,
>
>   Tim
>
> [1] http://www.apache.org/dyn/closer.cgi/tika/tika-app-1.14.jar
>
> [2] https://builds.apache.org/job/Tika-trunk/1193/org.apache.
> tika$tika-app/artifact/org.apache.tika/tika-app/1.15-
> 20170202.203920-124/tika-app-1.15-20170202.203920-124.jar
>
> [3] http://poi.apache.org/faq.html#faq-N10025
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Friday, February 3, 2017 9:49 AM
> To: solr-user 
> Subject: Re: Solr 6.4. Can't index MS Visio vsdx files
>
> This kind of information extraction comes from Apache Tika that is 
> shipped with Solr. However Solr does not ship every possible parser 
> with its installation. So, I think you are hitting Tika where it 
> manages to figure out what type of content you have, but does not have 
> (Apache POI - another O/S project) library installed.
>
> What you need to do is to get the additional jar from Tika/POI's 
> project/download and make it visible to Solr (probably as an extension 
> jar in a lib folder somewhere - I am a bit hazy on that for latest Solr).
>
> The 

XMLQueryParser support for Wildcards and Prefix Queries

2017-02-06 Thread Puneet Pawaia
Hi,

I see that the Lucene XMLQueryParser still does not support some query
types like Wildcard queries and Prefix queries.
How is the search for terms with wildcards etc proposed to be handled by
XmlQueryParser?

Thanks.
Puneet


Help with design choice: join or multiValued field

2017-02-06 Thread Karl Kildén
Hello!

I have Items and I have Shops. This is a e-commerce system with items from
thousands of shops all though the inventory is often similar between shops.
Some users can shop from any shop and some only from their default one.


One item can exist in about 1 shops.


   - When a user logs in they may have a shop pre selected so when they
   search for items we need to get all matching documents but if it's' found
   in their pre selected shop we should mark it out in the UI.
   - They need to be able to filter out only items in their current shop
   - Items found in their shop should always be boosted heavily



TLDR:

Either we just have a multiValued field on the item document with all
shops. This would be a multivalued field with 1 rows

Or

Could we have a new document ShopItem that has the shopId and the itemId
(think join table). Then we join this document instead... But we still need
to get the Item document back, and we need bq boosting on item?


Re: Issues with uniqueKey != id?

2017-02-06 Thread Matthias X Falkenberg
Hi Susheel,

My question is about the name of the "uniqueKey" field rather than the 
composition of its values. By default, Solr uses a field with the name 
"id". For reasons of ambiguity with the applications in my environment, I 
am considering to change the field name to, for example, "docId". Is that 
what you have also done for your compound keys?

One important aspect to consider after using a "uniqueKey" with a 
different name is 
http://lucene.apache.org/solr/6_3_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html
: "This class assumes the id field for your documents is called 'id' - if 
this is not the case, you must set the right name with 
setIdField(String)."

I am wondering whether there are more details or pitfalls that I should be 
aware of?

Mit freundlichen Grüßen / Kind regards,

Matthias Falkenberg

Team Lead - IBM Digital Experience Development
IBM Watson Content Hub, IBM WebSphere Portal, IBM Web Content Manager
IBM Deutschland Research & Development GmbH / Vorsitzende des 
Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, 
HRB 243294



From:   Susheel Kumar 
To: solr-user@lucene.apache.org
Date:   05-02-17 03:21 AM
Subject:Re: Issues with uniqueKey != id?



Hello,

So far in my experience haven't come across scenario where unique key/id 
is
not required.  Most of the times, I have put combination of few fields
like aggregate
or compound keys .  (e.g.
organization_id + employee_id etc.).  The reason it makes sense to have
some form of unique key is two fold
a) if there is no unique key, it kind of become impossible to update any
existing records since you can't uniquely identify them which means your
index will keep growing
b)  If no unique key then when you return search results, you wouldn't 
have
anything to relate with other/external system

Sometime you may have time-series data in which case may be timestamp or
combination of timestamp / other field may make sense  but yes Unique key
is not mandatory.

Thanks,
Susheel

On Fri, Feb 3, 2017 at 11:49 AM, Matthias X Falkenberg 

wrote:

> Howdy,
>
> In the Solr Wiki I stumbled upon a somewhat vague statement on the
> uniqueKey:
>
> >  https://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field
> >  It shouldn't matter whether you rename this to something else (and
> change the  value), but occasionally it has in the past. We
> recommend that you just leave this definition alone.
>
> I'd be very grateful for any positive or negative experiences with
> "uniqueKey" not being set to "id" - especially if your experiences are
> related to Solr 6.2.1+.
>
> Many thanks,
>
> Matthias Falkenberg
>
> IBM Deutschland Research & Development GmbH / Vorsitzende des
> Aufsichtsrats: Martina Koederitz
> Geschäftsführung: Dirk Wittkopp
> Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht 
Stuttgart,
> HRB 243294
>
>