Re: JDBC River missing documents??

joergpra...@gmail.com Thu, 30 Apr 2015 14:59:39 -0700

wow, thanks for sharing!

Best,


Jörg

On Thu, Apr 30, 2015 at 10:43 PM, GWired <garrettcjohn...@gmail.com> wrote:

> The below will build a table in SQL to store Refresh times.  The first
> time it runs it will put in an entry and going backwards in time until all
> records are retrieved.  Once compete it  will retrieve based on fields
> lastmodifieddate and createddate. The time based chunks allow ES enough
> time to process based on your window.  Replace your tables time fields with
> lastmodifieddate and createddate.  This could be genericized to pass in the
> field names of LMD and CD.
>
> In your SQL DB create a table to hold your refreshtime data:
>
> ElasticSearchRefreshTime
>
> CREATE TABLE [dbo].[ElasticSearchRefreshTime](
> [id] [int] IDENTITY(1,1) NOT NULL,
> [DataObjectID] [nvarchar](100) NOT NULL,
> [LastRefreshDateTime] [datetime2](7) NULL,
> [RebuildSearchIndex] [bit] NOT NULL CONSTRAINT
> [DF_Metadata_ElasticSearchRefreshTime_RebuildSearchIndex]  DEFAULT ((1)),
> [PreviousRunTime] [datetime2](7) NULL,
> [NumberofExecutionsSinceRebuild] [bigint] NOT NULL CONSTRAINT
> [DF_Metadata_ElasticSearchRefreshTime_NumberofExecutionsSinceRebuild]
>  DEFAULT ((0)),
> [RebuildMonth] [int] NOT NULL CONSTRAINT
> [DF_Metadata_ElasticSearchRefreshTime_RebuildPageSize]  DEFAULT ((0)),
> [RebuildYear] [int] NOT NULL CONSTRAINT
> [DF_Metadata_ElasticSearchRefreshTime_RebuildTransferPostion]  DEFAULT
> ((0)),
> [RebuildTotalCompleted] [bigint] NOT NULL CONSTRAINT
> [DF_Metadata_ElasticSearchRefreshTime_RebuildTotalCompleted]  DEFAULT ((0)),
> [InProcess] [bit] NOT NULL CONSTRAINT
> [DF_Metadata_ElasticSearchRefreshTime_InProcess]  DEFAULT ((0)),
>  CONSTRAINT [PK_Metadata_ElasticSearchRefreshTime_1] PRIMARY KEY CLUSTERED
> (
> [DataObjectID] ASC
> )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
> OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
> ) ON [PRIMARY]
>
> Then create your procedure:
>
> Create PROCEDURE [dbo].[ESTransport]
> @EntityName as nvarchar(50)
> AS
> BEGIN
> -- SET NOCOUNT ON added to prevent extra result sets from
> -- interfering with SELECT statements.
> SET NOCOUNT ON;
> declare @lastrefreshdatetime datetime2
> declare @RebuildSearchIndex bit
> declare @SprocEntryTime Datetime2 = SYSDATETIME()
> declare @RebuildMonth int = 0
> declare @RebuildYear int = 0
> declare @InProcess bit = 0
> declare @TotalRows bigint = 0
> declare @RebuildTotalCompleted bigint = 0
> declare @MaxDate Datetime2
> -- select count(1),max(createddate) from [Case]
> declare @TotalSQL nvarchar(200)
> declare @ParmDefinition nvarchar(255)
> SET @TotalSQL = 'select @TotalRowsOut = count(1), @MaxDateOut =
> Max(CreatedDate) from [' +  @EntityName +'] where isDeleted=''false'''
> SET @ParmDefinition = N'@TotalRowsOut bigint OUTPUT,
>                             @MaxDateOut datetime2 OUTPUT'
> EXECUTE sp_executesql
> @TotalSQL,
> @ParmDefinition,
> @TotalRowsOut = @TotalRows   OutPut,
> @MaxDateOut = @MaxDate OUTPUT
>  PRINT @TotalRows
> PRINT @MaxDate
>
> select
> @lastrefreshdatetime=[LastRefreshDateTime],@RebuildMonth=RebuildMonth,
> @RebuildYear=RebuildYear, @InProcess =
> InProcess,@RebuildSearchIndex=[RebuildSearchIndex],@RebuildTotalCompleted=RebuildTotalCompleted
> from [Metadata_ElasticSearchRefreshTime] where DataObjectID=@EntityName
>
> IF @lastrefreshdatetime is null
> BEGIN
> SET @RebuildSearchIndex=1
> SET @InProcess = 0
> INSERT INTO [dbo].[Metadata_ElasticSearchRefreshTime]
> ([DataObjectID]
> ,[LastRefreshDateTime]
> ,[RebuildSearchIndex]
> ,[NumberofExecutionsSinceRebuild]
> ,[RebuildMonth]
> ,[RebuildYear]
> ,[RebuildTotalCompleted]
> ,[InProcess])
> VALUES
> (@EntityName
> ,@SprocEntryTime
> ,@RebuildSearchIndex
> ,0
> ,MONTH(@MaxDate)
> ,YEAR(@MaxDate)
> ,0
> ,@InProcess)
> END
>  -- Begin statements
>
> IF @EntityName = 'Contact' and @RebuildSearchIndex  = 1
> BEGIN
> IF @InProcess=0
> BEGIN
> --First Pass here Direction is descending
> SET @InProcess=1
> SET @RebuildMonth = Month(@MaxDate)
> SET @RebuildYear = Year(@MaxDate)
> END
> ELSE IF @InProcess=1
> BEGIN
> --If you hit January drop to December of the prior year
> IF @RebuildMonth=1
> BEGIN
> SET @RebuildMonth = 12
> SET @RebuildYear = @RebuildYear-1
> END
> ELSE
> BEGIN
> SET @RebuildMonth =@RebuildMonth-1
> END
> END
>
> SELECT Entity.id as _id,Entity.*
> FROM [MyDB].[dbo].[Entity] WITH (NOLOCK)
> and Month(contact.CreatedDate)=@RebuildMonth and Year(contact.CreatedDate)
> = @RebuildYear
>
> UPDATE [dbo].[Metadata_ElasticSearchRefreshTime]
>    SET [DataObjectID] = @EntityName
>   ,[LastRefreshDateTime] = @SprocEntryTime
>   ,[RebuildSearchIndex] = CASE WHEN ((@RebuildTotalCompleted +
> @@ROWCOUNT)<@TotalRows) THEN 1 ELSE 0 END
>   ,[RebuildMonth] = @RebuildMonth
>   ,[RebuildYear] = @RebuildYear
>   ,[PreviousRunTime] = @lastrefreshdatetime
>   ,[NumberofExecutionsSinceRebuild]  = 0
>   ,[InProcess] = CASE WHEN ((@RebuildTotalCompleted +
> @@ROWCOUNT)<@TotalRows) THEN 1 ELSE 0 END
>   ,[RebuildTotalCompleted] = CASE WHEN (@RebuildTotalCompleted=0) THEN
> @@ROWCOUNT ELSE @RebuildTotalCompleted+@@ROWCOUNT END
>  WHERE DataObjectID=@EntityName
> END
> ELSE IF @EntityName = 'Entity'
> BEGIN
> UPDATE [dbo].[Metadata_ElasticSearchRefreshTime]
>    SET [DataObjectID] = @EntityName
>   ,[LastRefreshDateTime] = @SprocEntryTime
>   ,[RebuildSearchIndex] = 0
>   ,PreviousRunTime = @lastrefreshdatetime
>   ,NumberofExecutionsSinceRebuild = NumberofExecutionsSinceRebuild+1
>  WHERE DataObjectID=@EntityName
>
>                  SELECT Entity.id as _id,Entity.*
>          FROM [MyDB].[dbo].[Entity] WITH (NOLOCK)
> and ([Contact].LastModifiedDate >= @lastrefreshdatetime or
> [Contact].CreatedDate >=@lastrefreshdatetime)
> END
> END
>
> Finally in Sense I create the river (i used integrated security):
>
> PUT _river/Entity/_meta
> {
>     "type":"jdbc",
>     "jdbc": {
>         "url":"jdbc:sqlserver://dbserver.mydomain.com:1433
> ;databaseName=MyDB;integratedSecurity=true;",
>         "sql":{
>         "callable" : true,
>         "statement" : "{call ESTransport(?)}",
>         "parameter" : "Entity"
>         },
> "schedule":"0/30 0-59 0-23 ? * *",
> "autocommit":true,
>         "index":"jdbc",
>         "type":"Entity"
> }
> }
>
>
>
> On Wednesday, April 29, 2015 at 7:42:33 AM UTC-5, Jörg Prante wrote:
>>
>> Nice work, can you share the recipe with the community?
>>
>> I could post it on the JDBC plugin wiki
>>
>> Jörg
>>
>> On Wed, Apr 29, 2015 at 1:56 PM, GWired <garrett...@gmail.com> wrote:
>>
>>> My theory is that i was overloading my ES VM's on initial loads or when
>>> doing large loads.
>>>
>>> My cpu would jump to 99% and during the pulls it would fail silently and
>>> the river thought it got all the documents but didn't.
>>>
>>> I have since rewritten my rivers using Stored procedures instead and it
>>> pulls much faster and CPU doesn't go to high.
>>>
>>> The procedure keeps track of the created_at and updated_at for me
>>> instead of using the column strategy.  I also chunked my initial calls into
>>> logical batches.  Mine happen to be date based by month.  I couldn't figure
>>> out a way that made sense using row sizes or anything and month seemed just
>>> as good as any and it works.
>>>
>>> This also solved my problem of when / ES is turned off.  It now will
>>> just start back up where it last left off because SQL server is storing
>>> that info instead of ES.  I also made it so I can rebuild at will.
>>>
>>> Thanks Garrett
>>>
>>>
>>>
>>> On Thursday, April 23, 2015 at 2:48:03 AM UTC-4, Jörg Prante wrote:
>>>>
>>>> There are log messages at ES cluster side, you should look there why
>>>> bulk indexing failed.
>>>>
>>>> Jörg
>>>>
>>>> On Thu, Apr 23, 2015 at 5:45 AM, GWired <garrett...@gmail.com> wrote:
>>>>
>>>>> Found this in the logs:
>>>>>
>>>>> [2015-04-22 22:01:25,063][ERROR][river.jdbc.BulkNodeClient] bulk [15]
>>>>> failed with 945 failed items, failure message = failure in bulk execution:
>>>>>
>>>>>
>>>>>
>>>>> On Wednesday, April 22, 2015 at 7:53:25 PM UTC-5, GWired wrote:
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I've just been informed that i'm off by up to 100k records or so in
>>>>>> my jdbc river fed index.
>>>>>>
>>>>>> I am using the column strategy using a createddate and lastmodified
>>>>>> date.
>>>>>>
>>>>>> Kibana is reporting an entirely different # than what i see reported
>>>>>> in the DB..
>>>>>>
>>>>>> Table A has 978634 in SQL, 934646 shown in Kibana.
>>>>>> Table B has 957327 in SQL, 876725 shown in Kibana.
>>>>>> Table C has 312826 in SQL, 238534 shown in Kibana
>>>>>>
>>>>>> I see in the ES logs
>>>>>>
>>>>>> Table A metrics: 979044 rows,
>>>>>> Table B metrics: 957591 rows
>>>>>> Table C metrics: 312827 rows,
>>>>>>
>>>>>> These are the right numbers...well at least closer to right.
>>>>>>
>>>>>> But if i do this using Sense:
>>>>>>
>>>>>> GET jdbc/mytable/_count?q=*
>>>>>>
>>>>>> It returns the same # as Kibana is return.
>>>>>>
>>>>>> This erring version is running on ES 1.5.1 with Kibana version 3.0
>>>>>>
>>>>>> On another server with ES 1.5.0 and Kibana 3.0 it is working just
>>>>>> fine #'s match up.
>>>>>>
>>>>>> Any ideas?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to elasticsearc...@googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/5c2016b5-ef5f-4f54-ac37-b2991eba0775%40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/5c2016b5-ef5f-4f54-ac37-b2991eba0775%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ca6554c1-5cd9-46e2-af65-6c12a2bd23f9%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/ca6554c1-5cd9-46e2-af65-6c12a2bd23f9%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGkLaqcjFNeqwbuxNPk7NjUWJanr%3D5DDez7YFL-4KPaVw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: JDBC River missing documents??

Reply via email to