Re: DIH with huge data

2018-04-12 Thread Sujay Bawaskar
That sounds good option. So spark job will connect to MySQL and create solr
document which is pushed into solr using solrj probably in batches.

On Thu, Apr 12, 2018 at 10:48 PM, Rahul Singh 
wrote:

> If you want speed, Spark is the fastest easiest way. You can connect to
> relational tables directly and import or export to CSV / JSON and import
> from a distributed filesystem like S3 or HDFS.
>
> Combining a dfs with spark and a highly available SolR - you are
> maximizing all threads.
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Apr 12, 2018, 1:10 PM -0400, Sujay Bawaskar ,
> wrote:
> > Thanks Rahul. Data source is JdbcDataSource with MySQL database. Data
> size
> > is around 100GB.
> > I am not much familiar with spark but are you suggesting that we should
> > create document by merging distinct RDBMS tables in using RDD?
> >
> > On Thu, Apr 12, 2018 at 10:06 PM, Rahul Singh <
> rahul.xavier.si...@gmail.com
> > wrote:
> >
> > > How much data and what is the database source? Spark is probably the
> > > fastest way.
> > >
> > > --
> > > Rahul Singh
> > > rahul.si...@anant.us
> > >
> > > Anant Corporation
> > >
> > > On Apr 12, 2018, 7:28 AM -0400, Sujay Bawaskar <
> sujaybawas...@gmail.com>,
> > > wrote:
> > > > Hi,
> > > >
> > > > We are using DIH with SortedMapBackedCache but as data size
> increases we
> > > > need to provide more heap memory to solr JVM.
> > > > Can we use multiple CSV file instead of database queries and later
> data
> > > in
> > > > CSV files can be joined using zipper? So bottom line is to create CSV
> > > files
> > > > for each of entity in data-config.xml and join these CSV files using
> > > > zipper.
> > > > We also tried EHCache based DIH cache but since EHCache uses MMap IO
> its
> > > > not good to use with MMapDirectoryFactory and causes to exhaust
> physical
> > > > memory on machine.
> > > > Please suggest how can we handle use case of importing huge amount of
> > > data
> > > > into solr.
> > > >
> > > > --
> > > > Thanks,
> > > > Sujay P Bawaskar
> > > > M:+91-77091 53669
> > >
> >
> >
> >
> > --
> > Thanks,
> > Sujay P Bawaskar
> > M:+91-77091 53669
>



-- 
Thanks,
Sujay P Bawaskar
M:+91-77091 53669


Re: DIH with huge data

2018-04-12 Thread Sujay Bawaskar
Thanks Rahul. Data source is JdbcDataSource with MySQL database. Data size
is around 100GB.
I am not much familiar with spark but are you suggesting that we should
create document by merging distinct RDBMS tables in using RDD?

On Thu, Apr 12, 2018 at 10:06 PM, Rahul Singh 
wrote:

> How much data and what is the database source? Spark is probably the
> fastest way.
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Apr 12, 2018, 7:28 AM -0400, Sujay Bawaskar ,
> wrote:
> > Hi,
> >
> > We are using DIH with SortedMapBackedCache but as data size increases we
> > need to provide more heap memory to solr JVM.
> > Can we use multiple CSV file instead of database queries and later data
> in
> > CSV files can be joined using zipper? So bottom line is to create CSV
> files
> > for each of entity in data-config.xml and join these CSV files using
> > zipper.
> > We also tried EHCache based DIH cache but since EHCache uses MMap IO its
> > not good to use with MMapDirectoryFactory and causes to exhaust physical
> > memory on machine.
> > Please suggest how can we handle use case of importing huge amount of
> data
> > into solr.
> >
> > --
> > Thanks,
> > Sujay P Bawaskar
> > M:+91-77091 53669
>



-- 
Thanks,
Sujay P Bawaskar
M:+91-77091 53669


DIH with huge data

2018-04-12 Thread Sujay Bawaskar
Hi,

We are using DIH with SortedMapBackedCache but as data size increases we
need to provide more heap memory to solr JVM.
Can we use multiple CSV file instead of database queries and later data in
CSV files can be joined using zipper? So bottom line is to create CSV files
for each of entity in data-config.xml and join these CSV files using
zipper.
We also tried EHCache based DIH cache but since EHCache uses MMap IO its
not good to use with MMapDirectoryFactory and causes to exhaust physical
memory on machine.
Please suggest how can we handle use case of importing huge amount of data
into solr.

-- 
Thanks,
Sujay P Bawaskar
M:+91-77091 53669


Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Sujay Bawaskar
What is directory factory defined in solrconfig.xml? Your JVM heap should
be tuned up with respect to that.
How solr is being use,  is it more updates and less query or less updates
more queries?
What is OOM error? Is it frequent GC or Error 12?

On Wed, Apr 11, 2018 at 6:05 PM, Adam Harrison-Fuller <
aharrison-ful...@mintel.com> wrote:

> Hey Jesus,
>
> Thanks for the suggestions.  The Solr nodes have 4 CPUs assigned to them.
>
> Cheers!
> Adam
>
> On 11 April 2018 at 11:22, Jesus Olivan  wrote:
>
> > Hi Adam,
> >
> > IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical RAM,
> > your JVM can afford more RAM without threading penalties due to outside
> > heap RAM lacks.
> >
> > Another good one would be to increase -XX:CMSInitiatingOccupancyFraction
> > =50
> > to 75. I think that CMS collector works better when Old generation space
> is
> > more populated.
> >
> > I usually use to set Survivor spaces to lesser size. If you want to try
> > SurvivorRatio to 6, i think performance would be improved.
> >
> > Another good practice for me would be to set an static NewSize instead
> > of -XX:NewRatio=3.
> > You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb (one
> third
> > of total heap space is recommended).
> >
> > Finally, my best results after a deep JVM I+D related to Solr, came
> > removing ScavengeBeforeRemark flag and applying this new one: +
> > ParGCCardsPerStrideChunk.
> >
> > However, It would be a good one to set ParallelGCThreads and
> > *ConcGCThreads *to their optimal value, and we need you system CPU number
> > to know it. Can you provide this data, please?
> >
> > Regards
> >
> >
> > 2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller <
> > aharrison-ful...@mintel.com
> > >:
> >
> > > Hey all,
> > >
> > > I was wondering if I could get some JVM/GC tuning advice to resolve an
> > > issue that we are experiencing.
> > >
> > > Full disclaimer, I am in no way a JVM/Solr expert so any advice you can
> > > render would be greatly appreciated.
> > >
> > > Our Solr cloud nodes are having issues throwing OOM exceptions under
> > load.
> > > This issue has only started manifesting itself over the last few months
> > > during which time the only change I can discern is an increase in index
> > > size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".  The
> > > index is currently 58G and the server has 46G of physical RAM and runs
> > > nothing other than the Solr node.
> > >
> > > The JVM is invoked with the following JVM options:
> > > -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=
> > 6000
> > > -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
> > > -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888
> > -XX:+ManagementServer
> > > -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
> > > -XX:NewRatio=3 -XX:OldPLABSize=16
> > > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 3
> > > /data/gnpd/solr/logs
> > > -XX:ParallelGCThreads=4
> > > -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
> > > -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
> > > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
> > > -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
> > > -XX:TargetSurvivorRatio=90
> > > -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
> > > -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> > >
> > > These values were decided upon serveral years by a colleague based upon
> > > some suggestions from this mailing group with an index size ~25G.
> > >
> > > I have imported the GC logs into GCViewer and attached a link to a
> > > screenshot showing the lead up to a OOM crash.  Interestingly the young
> > > generation space is almost empty before the repeated GC's and
> subsequent
> > > crash.
> > > https://imgur.com/a/Wtlez
> > >
> > > I was considering slowly increasing the amount of heap available to the
> > JVM
> > > slowly until the crashes, any other suggestions?  I'm looking at trying
> > to
> > > get the nodes stable without having issues with the GC taking forever
> to
> > > run.
> > >
> > > Additional information can be provided on request.
> > >
> > > Cheers!
> > > Adam
> > >
> > > --
> > >
> > > Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> > > Registered in
> > > England: Number 1475918. | VAT Number: GB 232 9342 72
> > >
> > > Contact details for
> > > our other offices can be found at http://www.mintel.com/office-
> locations
> > > .
> > >
> > > This email and any attachments
> > > may include content that is confidential, privileged
> > > or otherwise
> > > protected under applicable law. Unauthorised disclosure, copying,
> > > distribution
> > > or use of the contents is prohibited and may be unlawful. If
> > > you have received this email in error,
> > > including without appropriate
> > > authorisation, then please reply to the sender about the error
> > > and delete
> > > thi

Re: Help Needed - Indexing Related

2018-03-27 Thread Sujay Bawaskar
Since this is a scheduled job I think you can get rid of commits and
optimize which are invoked from scheduled job.

On Tue, Mar 27, 2018 at 6:13 PM, YELESWARAPU, VENKATA BHAN <
vyeleswar...@statestreet.com> wrote:

> Information Classification: ll Limited Access
>
> Thanks for your response Sujay.
> Solr Version - 4.3.1
> Yes, we are using client api to generate index files.
> I don't see those parameters configured outside or in the logs, but
> indexing job is scheduled, which I think will take care of these.
> We have the option to schedule it to run in min intervals.
>
> Thank you,
> Dutt
>
>
> -Original Message-
> From: Sujay Bawaskar [mailto:sujaybawas...@gmail.com]
> Sent: Tuesday, March 27, 2018 8:32 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Help Needed - Indexing Related
>
> Few questions here,
>
> Are you using solrj client from java application?
> What is version of solr?
> How frequently commit and optimize operation is called from solr client?
> If commit and optimize are not called from client what is value for
> solr.autoCommit.maxTime and solr.autoSoftCommit.maxTime?
> What is current TPS and expected TPS?
>
> On Tue, Mar 27, 2018 at 5:38 PM, YELESWARAPU, VENKATA BHAN <
> vyeleswar...@statestreet.com> wrote:
>
> > Information Classification: ** Limited Access
> >
> > Hi Solr Team,
> >
> > Hope you are doing well. I have been struggling with indexing for a
> > week now.
> > Yesterday I deleted all indexing files and tried re-indexing. It
> > failed saying unable to open a new searcher. Also that _0.si file is
> missing.
> > Today I redeployed the whole application and tried indexing. Now
> > facing the below issues.
> > If you could guide me on this or if there is any documentation around
> > this, that would greatly help. Appreciate your time on this.
> >
> > 2018-03-27 07:53:59,896 DEBUG (DefaultdatabaseFunctions.java:319) -
> > lock [SolrIndexingJobReadFromQueue] acquired
> > 2018-03-27 07:53:59,924 DEBUG (SolrIndexingJob.java:193) - done
> > sleeping
> > 2018-03-27 07:53:59,929 DEBUG (DefaultdatabaseFunctions.java:313) -
> > lock [SolrIndexingJobReadFromQueue] already exists, will try updating
> > it now
> > 2018-03-27 07:53:59,971 DEBUG (SolrIndexingQueueServiceImpl.java:54) -
> > Object Alerts.CWI_096850 not fetched because its identifier
> > appears to be already in processing
> > 2018-03-27 07:53:59,971 DEBUG (SolrIndexingQueueServiceImpl.java:54) -
> > Object Alerts.CWI_096850 not fetched because its identifier
> > appears to be already in processing
> > 2018-03-27 07:53:59,971 DEBUG (SolrIndexingQueueServiceImpl.java:54) -
> > Object Alerts.CWI_096850 not fetched because its identifier
> > appears to be already in processing
> > 2018-03-27 07:53:59,971 DEBUG (SolrIndexingQueueServiceImpl.java:54) -
> > Object Alerts.CWI_096854 not fetched because its identifier
> > appears to be already in processing
> >
> > 2018-03-27 07:54:31,128 WARN  (SolrIndexingJob.java:107) - Solr
> > indexing job failed
> > java.lang.IndexOutOfBoundsException: Index: 16, Size: 10
> > at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> > at java.util.ArrayList.set(ArrayList.java:426)
> > at com.actimize.dao.DaoUtil.orderList(DaoUtil.java:215)
> > at
> > com.actimize.dao.AlertDaoImpl.findAlertsByIdentifierForIndex
> > ing(AlertDaoImpl.java:2347)
> > at sun.reflect.GeneratedMethodAccessor2119.invoke(Unknown
> Source)
> > at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> > at org.springframework.aop.support.AopUtils.
> > invokeJoinpointUsingReflection(AopUtils.java:317)
> > at org.springframework.aop.framework.ReflectiveMethodInvocation.
> > invokeJoinpoint(ReflectiveMethodInvocation.java:183)
> > at org.springframework.aop.framework.ReflectiveMethodInvocation.
> > proceed(ReflectiveMethodInvocation.java:150)
> > at com.actimize.infrastructure.perfmon.
> > PerformanceMonitorInterceptor.invokeUnderTrace(
> > PerformanceMonitorInterceptor.java:57)
> > at org.springframework.aop.interceptor.AbstractTraceInterceptor.
> > invoke(AbstractTraceInterceptor.java:111)
> > at org.springframework.aop.framework.ReflectiveMethodInvocation.
> > proceed(ReflectiveMethodInvocation.java:172)
> > at org.springframework.aop.framework.JdkDynamicAopProxy.
> > invoke(JdkDynamicAopProxy.java:204)
&

Re: Help Needed - Indexing Related

2018-03-27 Thread Sujay Bawaskar
Few questions here,

Are you using solrj client from java application?
What is version of solr?
How frequently commit and optimize operation is called from solr client?
If commit and optimize are not called from client what is value
for solr.autoCommit.maxTime and solr.autoSoftCommit.maxTime?
What is current TPS and expected TPS?

On Tue, Mar 27, 2018 at 5:38 PM, YELESWARAPU, VENKATA BHAN <
vyeleswar...@statestreet.com> wrote:

> Information Classification: ** Limited Access
>
> Hi Solr Team,
>
> Hope you are doing well. I have been struggling with indexing for a week
> now.
> Yesterday I deleted all indexing files and tried re-indexing. It failed
> saying unable to open a new searcher. Also that _0.si file is missing.
> Today I redeployed the whole application and tried indexing. Now facing
> the below issues.
> If you could guide me on this or if there is any documentation around
> this, that would greatly help. Appreciate your time on this.
>
> 2018-03-27 07:53:59,896 DEBUG (DefaultdatabaseFunctions.java:319) - lock
> [SolrIndexingJobReadFromQueue] acquired
> 2018-03-27 07:53:59,924 DEBUG (SolrIndexingJob.java:193) - done sleeping
> 2018-03-27 07:53:59,929 DEBUG (DefaultdatabaseFunctions.java:313) - lock
> [SolrIndexingJobReadFromQueue] already exists, will try updating it now
> 2018-03-27 07:53:59,971 DEBUG (SolrIndexingQueueServiceImpl.java:54) -
> Object Alerts.CWI_096850 not fetched because its identifier appears to
> be already in processing
> 2018-03-27 07:53:59,971 DEBUG (SolrIndexingQueueServiceImpl.java:54) -
> Object Alerts.CWI_096850 not fetched because its identifier appears to
> be already in processing
> 2018-03-27 07:53:59,971 DEBUG (SolrIndexingQueueServiceImpl.java:54) -
> Object Alerts.CWI_096850 not fetched because its identifier appears to
> be already in processing
> 2018-03-27 07:53:59,971 DEBUG (SolrIndexingQueueServiceImpl.java:54) -
> Object Alerts.CWI_096854 not fetched because its identifier appears to
> be already in processing
>
> 2018-03-27 07:54:31,128 WARN  (SolrIndexingJob.java:107) - Solr indexing
> job failed
> java.lang.IndexOutOfBoundsException: Index: 16, Size: 10
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.set(ArrayList.java:426)
> at com.actimize.dao.DaoUtil.orderList(DaoUtil.java:215)
> at com.actimize.dao.AlertDaoImpl.findAlertsByIdentifierForIndex
> ing(AlertDaoImpl.java:2347)
> at sun.reflect.GeneratedMethodAccessor2119.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.springframework.aop.support.AopUtils.
> invokeJoinpointUsingReflection(AopUtils.java:317)
> at org.springframework.aop.framework.ReflectiveMethodInvocation.
> invokeJoinpoint(ReflectiveMethodInvocation.java:183)
> at org.springframework.aop.framework.ReflectiveMethodInvocation.
> proceed(ReflectiveMethodInvocation.java:150)
> at com.actimize.infrastructure.perfmon.
> PerformanceMonitorInterceptor.invokeUnderTrace(
> PerformanceMonitorInterceptor.java:57)
> at org.springframework.aop.interceptor.AbstractTraceInterceptor.
> invoke(AbstractTraceInterceptor.java:111)
> at org.springframework.aop.framework.ReflectiveMethodInvocation.
> proceed(ReflectiveMethodInvocation.java:172)
> at org.springframework.aop.framework.JdkDynamicAopProxy.
> invoke(JdkDynamicAopProxy.java:204)
> at com.sun.proxy.$Proxy39.findAlertsByIdentifierForIndexing(Unknown
> Source)
> at com.actimize.services.AlertsServiceImpl.
> findAlertsByIdentifierForIndexing(AlertsServiceImpl.java:5568)
> at sun.reflect.GeneratedMethodAccessor2118.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.springframework.aop.support.AopUtils.
> invokeJoinpointUsingReflection(AopUtils.java:317)
> at org.springframework.aop.framework.ReflectiveMethodInvocation.
> invokeJoinpoint(ReflectiveMethodInvocation.java:183)
> at org.springframework.aop.framework.ReflectiveMethodInvocation.
> proceed(ReflectiveMethodInvocation.java:150)
> at org.springframework.transaction.interceptor.
> TransactionInterceptor$1.proceedWithInvocation(
> TransactionInterceptor.java:96)
> at org.springframework.transaction.interceptor.
> TransactionAspectSupport.invokeWithinTransaction(
> TransactionAspectSupport.java:260)
> at org.springframework.transaction.interceptor.
> TransactionInterceptor.invoke(TransactionInterceptor.java:94)
> at org.springframework.aop.framework.ReflectiveMethodInvocation.
> proceed(ReflectiveMethodInvocation.java:172)
> at com.actimize.infrastructure.util.DeadLockLockingInterceptor.
> invoke(DeadLockLockingInterceptor.java:40)
> 

Re: Reg:- Indexing MySQL data with Solr

2017-12-01 Thread Sujay Bawaskar
You can use data import handler with cache, its faster.

Check document :
https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html

On Sat, Dec 2, 2017 at 12:21 AM, @Nandan@ 
wrote:

> Hi ,
> I am working on an Ecommerce database . I have more then 40 tables and
> around 20GB of data.
> I want to index data with solr for more effective search feature.
> Please tell me how to index MySQL data with Apache solr.
> Thanks in advance
> Nandan Priyadarshi
>



-- 
Thanks,
Sujay P Bawaskar
M:+91-77091 53669


Re: DIH not stop

2017-11-16 Thread Sujay Bawaskar
  (qtp1638215613-15) [   x:cea2]
> o.a.s.c.S.Request [cea2]  webapp=/solr path=/dataimport
> params={indent=on&wt=json&command=status&_=1510816148489} status=0 QTime=1
> 2017-11-16 07:21:25.771 INFO  (qtp1638215613-43) [   x:cea2]
> o.a.s.c.S.Request [cea2]  webapp=/solr path=/dataimport
> params={indent=on&wt=json&command=status&_=1510816148489} status=0 QTime=0
> 2017-11-16 07:21:27.757 INFO  (qtp1638215613-43) [   x:cea2]
> o.a.s.c.S.Request [cea2]  webapp=/solr path=/dataimport
> params={indent=on&wt=json&command=status&_=1510816148489} status=0 QTime=0
> 2017-11-16 07:21:29.731 INFO  (commitScheduler-17-thread-1) [   x:cea2]
> o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,
> openSearcher=false,waitSearcher=true,expungeDeletes=false,
> softCommit=false,prepareCommit=false}
> 2017-11-16 07:21:29.731 INFO  (commitScheduler-17-thread-1) [   x:cea2]
> o.a.s.u.SolrIndexWriter Calling setCommitData with
> IW:org.apache.solr.update.SolrIndexWriter@44dcd2b6 commitCommandVersion:0
> 2017-11-16 07:21:29.838 INFO  (commitScheduler-17-thread-1) [   x:cea2]
> o.a.s.s.SolrIndexSearcher Opening [Searcher@2c6c6e0d[cea2] realtime]
> 2017-11-16 07:21:29.841 INFO  (commitScheduler-17-thread-1) [   x:cea2]
> o.a.s.u.DirectUpdateHandler2 end_commit_flush
> 2017-11-16 07:21:30.075 INFO  (qtp1638215613-43) [   x:cea2]
> o.a.s.c.S.Request [cea2]  webapp=/solr path=/dataimport
> params={indent=on&wt=json&command=status&_=1510816148489} status=0 QTime=0
> 2017-11-16 07:21:32.072 INFO  (qtp1638215613-43) [   x:cea2]
> o.a.s.c.S.Request [cea2]  webapp=/solr path=/dataimport
> params={indent=on&wt=json&command=status&_=1510816148489} status=0 QTime=0
> 2017-11-16 07:21:34.077 INFO  (qtp1638215613-14) [   x:cea2]
> o.a.s.c.S.Request [cea2]  webapp=/solr path=/dataimport
> params={indent=on&wt=json&command=status&_=1510816148489} status=0 QTime=0
> 2017-11-16 07:21:36.076 INFO  (qtp1638215613-14) [   x:cea2]
> o.a.s.c.S.Request [cea2]  webapp=/solr path=/dataimport
> params={indent=on&wt=json&command=status&_=1510816148489} status=0 QTime=0
> 2017-11-16 07:21:38.064 INFO  (qtp1638215613-14) [   x:cea2]
> o.a.s.c.S.Request [cea2]  webapp=/solr path=/dataimport
> params={indent=on&wt=json&command=status&_=1510816148489} status=0 QTime=0
> 2017-11-16 07:21:40.064 INFO  (qtp1638215613-14) [   x:cea2]
> o.a.s.c.S.Request [cea2]  webapp=/solr path=/dataimport
> params={indent=on&wt=json&command=status&_=1510816148489} status=0 QTime=0
> 2017-11-16 07:21:42.068 INFO  (qtp1638215613-14) [   x:cea2]
> o.a.s.c.S.Request [cea2]  webapp=/solr path=/dataimport
> params={indent=on&wt=json&command=status&_=1510816148489} status=0 QTime=1
> 2017-11-16 07:21:44.068 INFO  (qtp1638215613-14) [   x:cea2]
> o.a.s.c.S.Request [cea2]  webapp=/solr path=/dataimport
> params={indent=on&wt=json&command=status&_=1510816148489} status=0 QTime=0
> 2017-11-16 07:21:46.075 INFO  (qtp1638215613-14) [   x:cea2]
> o.a.s.c.S.Request [cea2]  webapp=/solr path=/dataimport
> params={indent=on&wt=json&command=status&_=1510816148489} status=0 QTime=1
> 2017-11-16 07:21:48.075 INFO  (qtp1638215613-43) [   x:cea2]
> o.a.s.c.S.Request [cea2]  webapp=/solr path=/dataimport
> params={indent=on&wt=json&command=status&_=1510816148489} status=0 QTime=2
> ^C
>
>
> Can Ezgi Aydemir
> Oracle Veri Tabanı Yöneticisi & Oracle Database Admin
> İşlem Coğrafi Bilgi Sistemleri Müh. & Eğitim AŞ.
> 2024.Cadde No:14, Beysukent 06800, Ankara, Türkiye
> T : 0 312 233 50 00 .:. F : 0312 235 56 82
> E :  cayde...@islem.com.tr .:. W : http://www.islem.com.tr
>
>
> -Original Message-
> From: Sujay Bawaskar [mailto:sujaybawas...@gmail.com]
> Sent: 16 November 2017 10:11
> To: solr-user@lucene.apache.org
> Subject: Re: DIH not stop
>
> I have experience this problem recently with MySQL and after checking
> solr.log found that there was a connection timeout from MySQL.
> Please check solr.log for any Cassandra connection errors.
>
> Thanks,
> Sujay
>
> On Thu, Nov 16, 2017 at 12:29 PM, Can Ezgi Aydemir 
> wrote:
>
> > Hi all,
> >
> > I configured Solr and Cassandra. Running full data import but not stop.
> > Only core load during this process, stop it. Seeing that stop dih, not
> > write dataimport.properties.
> >
> > In dataconfig.xml file, i define simplepropertywriter type and filename.
> > But not write it in dataimport.properties file.
> >
> > How can i solve this problem?
> >
> > Thx
> >
> > Regards.
> > Bu e-posta ve ekindekiler gizli bilgiler içeriyor olabilir ve sadece
> > adreslenen kişileri ilg

Re: DIH not stop

2017-11-15 Thread Sujay Bawaskar
I have experience this problem recently with MySQL and after checking
solr.log found that there was a connection timeout from MySQL.
Please check solr.log for any Cassandra connection errors.

Thanks,
Sujay

On Thu, Nov 16, 2017 at 12:29 PM, Can Ezgi Aydemir 
wrote:

> Hi all,
>
> I configured Solr and Cassandra. Running full data import but not stop.
> Only core load during this process, stop it. Seeing that stop dih, not
> write dataimport.properties.
>
> In dataconfig.xml file, i define simplepropertywriter type and filename.
> But not write it in dataimport.properties file.
>
> How can i solve this problem?
>
> Thx
>
> Regards.
> Bu e-posta ve ekindekiler gizli bilgiler içeriyor olabilir ve sadece
> adreslenen kişileri ilgilendirir. Eğer adreslenen kişi siz değilseniz, bu
> e-postayı yaymayınız, dağıtmayınız veya kopyalamayınız. Eğer bu e-posta
> yanlışlıkla size gönderildiyse, lütfen bu e-posta ve ekindeki dosyaları
> sisteminizden siliniz ve göndereni hemen bilgilendiriniz. Ayrıca, bu
> e-posta ve ekindeki dosyaları virüs bulaşması ihtimaline karşı taratınız.
> İŞLEM GIS® bu e-posta ile taşınabilecek herhangi bir virüsün neden
> olabileceği hasarın sorumluluğunu kabul etmez. Bilgi iç
> in:b...@islem.com.tr This message may contain confidential information
> and is intended only for recipient name. If you are not the named addressee
> you should not disseminate, distribute or copy this e-mail. Please notify
> the sender immediately if you have received this e-mail by mistake and
> delete this e-mail from your system. Finally, the recipient should check
> this email and any attachments for the presence of viruses. İŞLEM GIS®
> accepts no liability for any damage may be caused by any virus transmitted
> by this email.” For information: b...@islem.com.tr
>



-- 
Thanks,
Sujay P Bawaskar
M:+91-77091 53669


Re: Solr server partial update is very slow

2017-11-12 Thread Sujay Bawaskar
HI Shawn,

At time of indexing with partial updates CPU utilization is max 12%. Solr
JVM heap size is minimum 40GB because we are using data-import handler
with SortedMapBackedCache which uses java heap at time of full import.
Memory utilization is also decent when partial updates are running. Only
thing is when partial update is running at 700 updates per minutes the
QTime reaches 5 seconds. Is it the case that direct partial updates from
200 clients causing index merging to be slower? Here we open 40*200  (At
least 40 partial updates from each of process) HTTP solr connection with
solj for partial updates.

Thanks,
Sujay

On Mon, Nov 13, 2017 at 1:56 AM, Shawn Heisey  wrote:

> On 11/11/2017 8:17 AM, Sujay Bawaskar wrote:
>
> Thanks Shawn. Its good to know that OpenSearcher is not causing any issue.
>>
>> We are good with 15 minutes of softCommit interval . We are using stand
>> alone solr instance and not solr cloud. There are 100 cores on this machine
>> but index ingestion was going on for single core. Total size of index is
>> 100GB out of which this one with 10GB data is largest one.
>> Standalone solr machine is hosted on dedicated instances with 4 CPU cores
>> and 120 GB Memory. Solr JVM is configured with xms=40G and xmx=80G. In this
>> case partial update is being performed by 200 solr clients simultaneously.
>>
>
> Looks like I managed to send my previous reply direct instead of to the
> list.  I'm sending this one to the list.
>
> Why is your heap 80GB?  That's *huge*.  With 80GB of the 120GB total used
> by one Java process, you've got about 40GB left to cache the index --
> assuming that this one Solr instance is the only significant program
> running on the server.  40GB to cache a 100GB index might be enough for
> good performance, or it might not be enough.  There are no easy formulas
> for figuring that out.A heap that size is also likely to experience some
> occasional stop-the-world GC pauses that could take a VERY long time.
>
> My dev Solr server (6.6.2-SNAPSHOT) has all of the indexes on it that use
> several servers in production.  That's over 700GB of index data.  This
> server runs with a 28GB heap, and the only reason it's *that* high is
> because I had to increase the heap in order to successfully run some
> data-mining grouping and facet queries.  Normally it works just fine with
> about a 13GB heap.
>
> 200 simultaneous indexing requests seems excessive to me, especially when
> the Solr server only has 4 CPUs.  Indexing several requests at the same
> time is the best way to achieve fast indexing, but if you have too many,
> it's could actually get *worse* than indexing with only one thread/process.
>
> Thanks,
> Shawn
>
> 
> For completeness, below is the full text of the thread where I replied
> before:
>
> On Fri, Nov 10, 2017 at 8:59 PM, Shawn Heisey > <mailto:apa...@elyograg.org>>wrote:
>>
>> On 11/9/2017 10:25 PM, Sujay Bawaskar wrote:
>> > We are getting below log without invoking commit operation after
>> every
>> > partial update call. We have configured soft commit and commit
>> time as
>> > below. With below configuration we are able to perform 800
>> partial updates
>> > per minutes which I think is very slow. Our Index size is 10GB
>> for this
>> > particular core.
>> > Is there any configuration we are missing here?
>> >
>> > Log:
>> > 2017-11-10 05:13:33.730 INFO (qtp225493257-38746) [   x:collection]
>> > o.a.s.s.SolrIndexSearcher Opening [Searcher@7010b1c6[collection]
>> realtime]
>>
>> This is a *realtime* searcher, for the realtime get handler.
>> These will
>> be recreated frequently as you index.  Opening realtime searchers
>> should
>> be extremely fast and not really affect the system much, and this
>> happens without any configuration or user action.
>>
>> The realtime get handler, which is typically accessed as /get, can
>> retrieve documents that haven't been made accessible to the normal
>> index
>> searcher.  If this feature were likely to cause performance
>> problems, it
>> would not be turned on by default.
>>
>> https://lucene.apache.org/solr/guide/6_6/realtime-get.html
>> <https://lucene.apache.org/solr/guide/6_6/realtime-get.html>
>>
>> Are you seeing any other frequent logs about opening searchers that
>> aren't realtime?
>>
>> > Commit configuration:
>> > solr.autoCommit.maxTime:180
>> &g

Re: Solr server partial update is very slow

2017-11-10 Thread Sujay Bawaskar
Hi Erick,

Some of the partial updates are taking huge time. Average QTime for updates
in 15 minute interval is 14344.

2017-11-10 08:15:11.863 INFO  (qtp225493257-43961) [   x:collection]
o.a.s.c.S.Request [collection]  webapp=/solr path=/update
params={wt=javabin&version=2} status=0 QTime=10073904.

On Fri, Nov 10, 2017 at 12:27 PM, Sujay Bawaskar 
wrote:

> Any reason we get below log even if client does not issue commit or we can
> ignore this log?
>
> Log: 2017-11-10 05:13:33.730 INFO  (qtp225493257-38746) [   x:collection]
>  o.a.s.s.SolrIndexSearcher Opening  [Searcher@7010b1c6[collection]
> realtime]
>
> On Fri, Nov 10, 2017 at 12:06 PM, Sujay Bawaskar 
> wrote:
>
>> We are not issuing client side commit for partial update. We have 
>> openSearcher=false
>> in solrconfig.xml, in this case we have set softCommit interval as 15
>> minutes. Solr version is 6.4.1.
>>
>> Thanks,
>> Sujay
>>
>> On Fri, Nov 10, 2017 at 11:58 AM, Erick Erickson > > wrote:
>>
>>> bq: We are getting below log without invoking commit operation after
>>> every partial update call
>>>
>>> Not sure what you mean here. If you're issuing a commit from the
>>> client every time you update a doc (or even a batch) that's an
>>> anti-pattern and you're opening searchers all the time. Don't do that
>>> ;).
>>>
>>> I'd set my autoCommit time to something reasonable like 60 seconds (or
>>> even 15) with openSearcher=false in solrconfig.xml. Set your soft
>>> commit to however long you can stand, I try for at least 10 seconds,
>>> but 60 or even 300 if possible, it all depends on how long after you
>>> index a document it has to be available for search.
>>>
>>> The settings you have are dangerous. See:
>>>
>>> https://lucidworks.com/2013/08/23/understanding-transaction-
>>> logs-softcommit-and-commit-in-sorlcloud/
>>>
>>> Best,
>>> Erick
>>>
>>> On Thu, Nov 9, 2017 at 9:25 PM, Sujay Bawaskar 
>>> wrote:
>>> > Hi,
>>> >
>>> > We are getting below log without invoking commit operation after every
>>> > partial update call. We have configured soft commit and commit time as
>>> > below. With below configuration we are able to perform 800 partial
>>> updates
>>> > per minutes which I think is very slow. Our Index size is 10GB for this
>>> > particular core.
>>> > Is there any configuration we are missing here?
>>> >
>>> > Log:
>>> > 2017-11-10 05:13:33.730 INFO  (qtp225493257-38746) [   x:collection]
>>> > o.a.s.s.SolrIndexSearcher Opening  [Searcher@7010b1c6[collection]
>>> realtime]
>>> >
>>> > Commit configuration:
>>> > solr.autoCommit.maxTime:180
>>> > solr.autoSoftCommit.maxTime:90
>>> >
>>> >
>>> >
>>> > --
>>> > Thanks,
>>> > Sujay P Bawaskar
>>> > M:+91-77091 53669
>>>
>>
>>
>>
>> --
>> Thanks,
>> Sujay P Bawaskar
>> M:+91-77091 53669
>>
>
>
>
> --
> Thanks,
> Sujay P Bawaskar
> M:+91-77091 53669
>



-- 
Thanks,
Sujay P Bawaskar
M:+91-77091 53669


Re: Solr server partial update is very slow

2017-11-09 Thread Sujay Bawaskar
Any reason we get below log even if client does not issue commit or we can
ignore this log?

Log: 2017-11-10 05:13:33.730 INFO  (qtp225493257-38746) [
x:collection]  o.a.s.s.SolrIndexSearcher
Opening  [Searcher@7010b1c6[collection] realtime]

On Fri, Nov 10, 2017 at 12:06 PM, Sujay Bawaskar 
wrote:

> We are not issuing client side commit for partial update. We have 
> openSearcher=false
> in solrconfig.xml, in this case we have set softCommit interval as 15
> minutes. Solr version is 6.4.1.
>
> Thanks,
> Sujay
>
> On Fri, Nov 10, 2017 at 11:58 AM, Erick Erickson 
> wrote:
>
>> bq: We are getting below log without invoking commit operation after
>> every partial update call
>>
>> Not sure what you mean here. If you're issuing a commit from the
>> client every time you update a doc (or even a batch) that's an
>> anti-pattern and you're opening searchers all the time. Don't do that
>> ;).
>>
>> I'd set my autoCommit time to something reasonable like 60 seconds (or
>> even 15) with openSearcher=false in solrconfig.xml. Set your soft
>> commit to however long you can stand, I try for at least 10 seconds,
>> but 60 or even 300 if possible, it all depends on how long after you
>> index a document it has to be available for search.
>>
>> The settings you have are dangerous. See:
>>
>> https://lucidworks.com/2013/08/23/understanding-transaction-
>> logs-softcommit-and-commit-in-sorlcloud/
>>
>> Best,
>> Erick
>>
>> On Thu, Nov 9, 2017 at 9:25 PM, Sujay Bawaskar 
>> wrote:
>> > Hi,
>> >
>> > We are getting below log without invoking commit operation after every
>> > partial update call. We have configured soft commit and commit time as
>> > below. With below configuration we are able to perform 800 partial
>> updates
>> > per minutes which I think is very slow. Our Index size is 10GB for this
>> > particular core.
>> > Is there any configuration we are missing here?
>> >
>> > Log:
>> > 2017-11-10 05:13:33.730 INFO  (qtp225493257-38746) [   x:collection]
>> > o.a.s.s.SolrIndexSearcher Opening  [Searcher@7010b1c6[collection]
>> realtime]
>> >
>> > Commit configuration:
>> > solr.autoCommit.maxTime:180
>> > solr.autoSoftCommit.maxTime:90
>> >
>> >
>> >
>> > --
>> > Thanks,
>> > Sujay P Bawaskar
>> > M:+91-77091 53669
>>
>
>
>
> --
> Thanks,
> Sujay P Bawaskar
> M:+91-77091 53669
>



-- 
Thanks,
Sujay P Bawaskar
M:+91-77091 53669


Re: Solr server partial update is very slow

2017-11-09 Thread Sujay Bawaskar
We are not issuing client side commit for partial update. We have
openSearcher=false
in solrconfig.xml, in this case we have set softCommit interval as 15
minutes. Solr version is 6.4.1.

Thanks,
Sujay

On Fri, Nov 10, 2017 at 11:58 AM, Erick Erickson 
wrote:

> bq: We are getting below log without invoking commit operation after
> every partial update call
>
> Not sure what you mean here. If you're issuing a commit from the
> client every time you update a doc (or even a batch) that's an
> anti-pattern and you're opening searchers all the time. Don't do that
> ;).
>
> I'd set my autoCommit time to something reasonable like 60 seconds (or
> even 15) with openSearcher=false in solrconfig.xml. Set your soft
> commit to however long you can stand, I try for at least 10 seconds,
> but 60 or even 300 if possible, it all depends on how long after you
> index a document it has to be available for search.
>
> The settings you have are dangerous. See:
>
> https://lucidworks.com/2013/08/23/understanding-
> transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Best,
> Erick
>
> On Thu, Nov 9, 2017 at 9:25 PM, Sujay Bawaskar 
> wrote:
> > Hi,
> >
> > We are getting below log without invoking commit operation after every
> > partial update call. We have configured soft commit and commit time as
> > below. With below configuration we are able to perform 800 partial
> updates
> > per minutes which I think is very slow. Our Index size is 10GB for this
> > particular core.
> > Is there any configuration we are missing here?
> >
> > Log:
> > 2017-11-10 05:13:33.730 INFO  (qtp225493257-38746) [   x:collection]
> > o.a.s.s.SolrIndexSearcher Opening  [Searcher@7010b1c6[collection]
> realtime]
> >
> > Commit configuration:
> > solr.autoCommit.maxTime:180
> > solr.autoSoftCommit.maxTime:90
> >
> >
> >
> > --
> > Thanks,
> > Sujay P Bawaskar
> > M:+91-77091 53669
>



-- 
Thanks,
Sujay P Bawaskar
M:+91-77091 53669


Solr server partial update is very slow

2017-11-09 Thread Sujay Bawaskar
Hi,

We are getting below log without invoking commit operation after every
partial update call. We have configured soft commit and commit time as
below. With below configuration we are able to perform 800 partial updates
per minutes which I think is very slow. Our Index size is 10GB for this
particular core.
Is there any configuration we are missing here?

Log:
2017-11-10 05:13:33.730 INFO  (qtp225493257-38746) [   x:collection]
o.a.s.s.SolrIndexSearcher Opening  [Searcher@7010b1c6[collection] realtime]

Commit configuration:
solr.autoCommit.maxTime:180
solr.autoSoftCommit.maxTime:90



-- 
Thanks,
Sujay P Bawaskar
M:+91-77091 53669


Re: Issue with delta import

2017-07-26 Thread Sujay Bawaskar
can you please try ${dih.last_index_time} instead of
${dataimporter.last_index_time}.

On Wed, Jul 26, 2017 at 2:33 PM, bhargava ravali koganti <
ravali@gmail.com> wrote:

> Hi,
>
> I'm trying to integrate Solr and Cassandra. I"m facing problem with delta
> import. For every 10 minutes I'm running deltaquery using cron job. If any
> changes in the data based on last index time, it has to fetch the data(as
> far as my knowledge), however, it keeps fetching the whole data
> irrespective of changes.
>
> My problem:
> https://stackoverflow.com/questions/45304803/deltaimport-fetches-all-the-
> data
>
> Looking forward to hear from you.
>
> Thanks,
> Bhargava Ravali Koganti
>



-- 
Thanks,
Sujay P Bawaskar
M:+91-77091 53669


Re: Parent child documents partial update

2017-07-18 Thread Sujay Bawaskar
Yup, got it!

On Tue, Jul 18, 2017 at 12:22 PM, Amrit Sarkar 
wrote:

> Sujay,
>
> Lucene index is in flat-object document style, so I really not think nested
> documents at index / storage will ever be supported unless someone change
> the very intricacy of the index.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Tue, Jul 18, 2017 at 8:11 AM, Sujay Bawaskar 
> wrote:
>
> > Thanks Amrit. So storage mechanism of parent child documents is limiting
> > the capability of partial update. It would be great to have flawless
> parent
> > child index support in solr.
> >
> > On 17-Jul-2017 11:14 PM, "Amrit Sarkar"  wrote:
> >
> > > Sujay,
> > >
> > > Not really. Parent-child documents are stored in a single block
> > > contiguously. Read more about parent-child relationship at:
> > > https://medium.com/@sarkaramrit2/multiple-
> documents-with-same-doc-id-in-
> > > index-in-solr-cloud-32c072db2164
> > >
> > > While we perform partial / atomic update, say {"id":"X",
> > > "fieldA":{"set":"Z"}, that particular doc with X will be fetched (all
> the
> > > "stored" fields), update will be performed and indexed, all happens in
> > > *DistributedUpdateProcessor* internally. So there is no way it will
> fetch
> > > the child documents along with it.
> > >
> > > I am not sure whether this can be done with current code or it will be
> > > fixed / improved in the future.
> > >
> > > Amrit Sarkar
> > > Search Engineer
> > > Lucidworks, Inc.
> > > 415-589-9269
> > > www.lucidworks.com
> > > Twitter http://twitter.com/lucidworks
> > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > >
> > > On Mon, Jul 17, 2017 at 12:44 PM, Sujay Bawaskar <
> > sujaybawas...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Need a help to understand solr parent child document partial update
> > > > behaviour. Can we perform partial update on parent document without
> > > losing
> > > > its chiild documents? My observation is that parent child
> relationship
> > > > between documents get lost in case partial update is performed on
> > parent.
> > > > Any work around or solution to this issue?
> > > >
> > > > --
> > > > Thanks,
> > > > Sujay P Bawaskar
> > > > M:+91-77091 53669
> > > >
> > >
> >
>


Re: Parent child documents partial update

2017-07-17 Thread Sujay Bawaskar
Thanks Amrit. So storage mechanism of parent child documents is limiting
the capability of partial update. It would be great to have flawless parent
child index support in solr.

On 17-Jul-2017 11:14 PM, "Amrit Sarkar"  wrote:

> Sujay,
>
> Not really. Parent-child documents are stored in a single block
> contiguously. Read more about parent-child relationship at:
> https://medium.com/@sarkaramrit2/multiple-documents-with-same-doc-id-in-
> index-in-solr-cloud-32c072db2164
>
> While we perform partial / atomic update, say {"id":"X",
> "fieldA":{"set":"Z"}, that particular doc with X will be fetched (all the
> "stored" fields), update will be performed and indexed, all happens in
> *DistributedUpdateProcessor* internally. So there is no way it will fetch
> the child documents along with it.
>
> I am not sure whether this can be done with current code or it will be
> fixed / improved in the future.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Mon, Jul 17, 2017 at 12:44 PM, Sujay Bawaskar 
> wrote:
>
> > Hi,
> >
> > Need a help to understand solr parent child document partial update
> > behaviour. Can we perform partial update on parent document without
> losing
> > its chiild documents? My observation is that parent child relationship
> > between documents get lost in case partial update is performed on parent.
> > Any work around or solution to this issue?
> >
> > --
> > Thanks,
> > Sujay P Bawaskar
> > M:+91-77091 53669
> >
>


Parent child documents partial update

2017-07-17 Thread Sujay Bawaskar
Hi,

Need a help to understand solr parent child document partial update
behaviour. Can we perform partial update on parent document without losing
its chiild documents? My observation is that parent child relationship
between documents get lost in case partial update is performed on parent.
Any work around or solution to this issue?

-- 
Thanks,
Sujay P Bawaskar
M:+91-77091 53669


Re: DIH delta import with cache 5.3.1 issue

2017-06-20 Thread Sujay Bawaskar
Hi,

Did not encounter this issue with solr 6.x. But delta import with cache
executes nested query for every element encountered in parent query. Since
this select does not have where clause because we are using cache, it takes
long time. So delta import witch cache is very slow. My observation is
 behaviour of delta import with caching is not similar to that of full
import with caching. If delta query selects 10 elements then its like
executing select all query on database for all ten records.
Any comment on this behaviour of delta import?

On Thu, Mar 16, 2017 at 7:47 PM, Sujay Bawaskar <
sujay.bawas...@firstfuel.com> wrote:

> Thanks Alex. I will test it with 5.4 and 6.4 and let you know.
>
> On Thu, Mar 16, 2017 at 7:40 PM, Alexandre Rafalovitch  > wrote:
>
>> You have nested entities and accumulate the content of the inner
>> entities in the outer one with caching on an inner one. Your
>> description sounds like the inner cache is not reset on the next
>> iteration of the outer loop.
>>
>> This may be connected to
>> https://issues.apache.org/jira/browse/SOLR-7843 (Fixed in 5.4)
>>
>> Or it may be a different bug. I would make a simplest test case (based
>> on DIH-db example) and then try it on 5.3.1 and 5.4. And then 6.4 if
>> the problem is still there. If it is still there in 6.4, then we may
>> have a new bug.
>>
>> Regards,
>>Alex.
>> 
>> http://www.solr-start.com/ - Resources for Solr users, new and
>> experienced
>>
>>
>> On 16 March 2017 at 09:17, Sujay Bawaskar 
>> wrote:
>> > This behaviour is for delta import only. One document get field values
>> of
>> > all documents. These fields are child entities which maps column to
>> multi
>> > valued fields.
>> >
>> > > > query="IMPORT_QUERY"
>> > deltaQuery="DELTA_QUERY"
>> > pk="buildingUserId"
>> > deletedPkQuery="DELETE_QUERY"
>> > onError="continue">
>> >
>> >  > > query="SELECT_QUERY"
>> > transformer="RegexTransformer" cacheImpl="SortedMapBackedCache"
>> > cacheKey="bldId" cacheLookup="user_building.plainBuildingId"
>> > onError="continue">
>> > 
>> > > > splitBy="," />
>> > > > dateTimeFormat="-MM-dd" />
>> > 
>> > 
>> >
>> > On Thu, Mar 16, 2017 at 6:35 PM, Alexandre Rafalovitch <
>> arafa...@gmail.com>
>> > wrote:
>> >
>> >> Could you give a bit more details. Do you mean one document gets the
>> >> content of multiple documents? And only on delta?
>> >>
>> >> Regards,
>> >> Alex
>> >>
>> >> On 16 Mar 2017 8:53 AM, "Sujay Bawaskar" > >
>> >> wrote:
>> >>
>> >> Hi,
>> >>
>> >> We are using DIH with cache(SortedMapBackedCache) with solr 5.3.1. We
>> have
>> >> around 2.8 million documents in solr and total index size is 4 GB. DIH
>> >> delta import is dumping all values of mapped columns to their
>> respective
>> >> multi valued fields. This is causing size of one solr document upto 2
>> GB.
>> >> Is this a known issue with solr 5.3.1?
>> >>
>> >> Thanks,
>> >> Sujay
>> >>
>>
>
>


Re: Data Import

2017-03-17 Thread Sujay Bawaskar
Hi Vishal,

As per my experience DIH is the best for RDBMS to solr index. DIH with
caching has best performance. DIH nested entities allow you to define
simple queries.
Also, solrj is good when you want your RDBMS updates make immediately
available in solr. DIH full import can be used for index all data first
time or restore index in case index is corrupted.

Thanks,
Sujay

On Fri, Mar 17, 2017 at 2:34 PM, vishal jain  wrote:

> Hi,
>
>
> I am new to Solr and am trying to move data from my RDBMS to Solr. I know
> the available options are:
> 1) Post Tool
> 2) DIH
> 3) SolrJ (as ours is a J2EE application).
>
> I want to know what is the recommended way for Data import in production
> environment.
> Will sending data via SolrJ in batches be faster than posting a csv using
> POST tool?
>
>
> Thanks,
> Vishal
>



-- 
Thanks,
Sujay P Bawaskar
M:+91-77091 53669


Re: DIH delta import with cache 5.3.1 issue

2017-03-16 Thread Sujay Bawaskar
Thanks Alex. I will test it with 5.4 and 6.4 and let you know.

On Thu, Mar 16, 2017 at 7:40 PM, Alexandre Rafalovitch 
wrote:

> You have nested entities and accumulate the content of the inner
> entities in the outer one with caching on an inner one. Your
> description sounds like the inner cache is not reset on the next
> iteration of the outer loop.
>
> This may be connected to
> https://issues.apache.org/jira/browse/SOLR-7843 (Fixed in 5.4)
>
> Or it may be a different bug. I would make a simplest test case (based
> on DIH-db example) and then try it on 5.3.1 and 5.4. And then 6.4 if
> the problem is still there. If it is still there in 6.4, then we may
> have a new bug.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 16 March 2017 at 09:17, Sujay Bawaskar 
> wrote:
> > This behaviour is for delta import only. One document get field values of
> > all documents. These fields are child entities which maps column to multi
> > valued fields.
> >
> >  > query="IMPORT_QUERY"
> > deltaQuery="DELTA_QUERY"
> > pk="buildingUserId"
> > deletedPkQuery="DELETE_QUERY"
> > onError="continue">
> >
> >   > query="SELECT_QUERY"
> > transformer="RegexTransformer" cacheImpl="SortedMapBackedCache"
> > cacheKey="bldId" cacheLookup="user_building.plainBuildingId"
> > onError="continue">
> > 
> >  > splitBy="," />
> >  > dateTimeFormat="-MM-dd" />
> > 
> > 
> >
> > On Thu, Mar 16, 2017 at 6:35 PM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> >> Could you give a bit more details. Do you mean one document gets the
> >> content of multiple documents? And only on delta?
> >>
> >> Regards,
> >> Alex
> >>
> >> On 16 Mar 2017 8:53 AM, "Sujay Bawaskar" 
> >> wrote:
> >>
> >> Hi,
> >>
> >> We are using DIH with cache(SortedMapBackedCache) with solr 5.3.1. We
> have
> >> around 2.8 million documents in solr and total index size is 4 GB. DIH
> >> delta import is dumping all values of mapped columns to their respective
> >> multi valued fields. This is causing size of one solr document upto 2
> GB.
> >> Is this a known issue with solr 5.3.1?
> >>
> >> Thanks,
> >> Sujay
> >>
>


Re: DIH delta import with cache 5.3.1 issue

2017-03-16 Thread Sujay Bawaskar
This behaviour is for delta import only. One document get field values of
all documents. These fields are child entities which maps column to multi
valued fields.



 






On Thu, Mar 16, 2017 at 6:35 PM, Alexandre Rafalovitch 
wrote:

> Could you give a bit more details. Do you mean one document gets the
> content of multiple documents? And only on delta?
>
> Regards,
> Alex
>
> On 16 Mar 2017 8:53 AM, "Sujay Bawaskar" 
> wrote:
>
> Hi,
>
> We are using DIH with cache(SortedMapBackedCache) with solr 5.3.1. We have
> around 2.8 million documents in solr and total index size is 4 GB. DIH
> delta import is dumping all values of mapped columns to their respective
> multi valued fields. This is causing size of one solr document upto 2 GB.
> Is this a known issue with solr 5.3.1?
>
> Thanks,
> Sujay
>


DIH delta import with cache 5.3.1 issue

2017-03-16 Thread Sujay Bawaskar
Hi,

We are using DIH with cache(SortedMapBackedCache) with solr 5.3.1. We have
around 2.8 million documents in solr and total index size is 4 GB. DIH
delta import is dumping all values of mapped columns to their respective
multi valued fields. This is causing size of one solr document upto 2 GB.
Is this a known issue with solr 5.3.1?

Thanks,
Sujay