wow, thanks for sharing! Best,
Jörg On Thu, Apr 30, 2015 at 10:43 PM, GWired <garrettcjohn...@gmail.com> wrote: > The below will build a table in SQL to store Refresh times. The first > time it runs it will put in an entry and going backwards in time until all > records are retrieved. Once compete it will retrieve based on fields > lastmodifieddate and createddate. The time based chunks allow ES enough > time to process based on your window. Replace your tables time fields with > lastmodifieddate and createddate. This could be genericized to pass in the > field names of LMD and CD. > > In your SQL DB create a table to hold your refreshtime data: > > ElasticSearchRefreshTime > > CREATE TABLE [dbo].[ElasticSearchRefreshTime]( > [id] [int] IDENTITY(1,1) NOT NULL, > [DataObjectID] [nvarchar](100) NOT NULL, > [LastRefreshDateTime] [datetime2](7) NULL, > [RebuildSearchIndex] [bit] NOT NULL CONSTRAINT > [DF_Metadata_ElasticSearchRefreshTime_RebuildSearchIndex] DEFAULT ((1)), > [PreviousRunTime] [datetime2](7) NULL, > [NumberofExecutionsSinceRebuild] [bigint] NOT NULL CONSTRAINT > [DF_Metadata_ElasticSearchRefreshTime_NumberofExecutionsSinceRebuild] > DEFAULT ((0)), > [RebuildMonth] [int] NOT NULL CONSTRAINT > [DF_Metadata_ElasticSearchRefreshTime_RebuildPageSize] DEFAULT ((0)), > [RebuildYear] [int] NOT NULL CONSTRAINT > [DF_Metadata_ElasticSearchRefreshTime_RebuildTransferPostion] DEFAULT > ((0)), > [RebuildTotalCompleted] [bigint] NOT NULL CONSTRAINT > [DF_Metadata_ElasticSearchRefreshTime_RebuildTotalCompleted] DEFAULT ((0)), > [InProcess] [bit] NOT NULL CONSTRAINT > [DF_Metadata_ElasticSearchRefreshTime_InProcess] DEFAULT ((0)), > CONSTRAINT [PK_Metadata_ElasticSearchRefreshTime_1] PRIMARY KEY CLUSTERED > ( > [DataObjectID] ASC > )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = > OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] > ) ON [PRIMARY] > > Then create your procedure: > > Create PROCEDURE [dbo].[ESTransport] > @EntityName as nvarchar(50) > AS > BEGIN > -- SET NOCOUNT ON added to prevent extra result sets from > -- interfering with SELECT statements. > SET NOCOUNT ON; > declare @lastrefreshdatetime datetime2 > declare @RebuildSearchIndex bit > declare @SprocEntryTime Datetime2 = SYSDATETIME() > declare @RebuildMonth int = 0 > declare @RebuildYear int = 0 > declare @InProcess bit = 0 > declare @TotalRows bigint = 0 > declare @RebuildTotalCompleted bigint = 0 > declare @MaxDate Datetime2 > -- select count(1),max(createddate) from [Case] > declare @TotalSQL nvarchar(200) > declare @ParmDefinition nvarchar(255) > SET @TotalSQL = 'select @TotalRowsOut = count(1), @MaxDateOut = > Max(CreatedDate) from [' + @EntityName +'] where isDeleted=''false''' > SET @ParmDefinition = N'@TotalRowsOut bigint OUTPUT, > @MaxDateOut datetime2 OUTPUT' > EXECUTE sp_executesql > @TotalSQL, > @ParmDefinition, > @TotalRowsOut = @TotalRows OutPut, > @MaxDateOut = @MaxDate OUTPUT > PRINT @TotalRows > PRINT @MaxDate > > select > @lastrefreshdatetime=[LastRefreshDateTime],@RebuildMonth=RebuildMonth, > @RebuildYear=RebuildYear, @InProcess = > InProcess,@RebuildSearchIndex=[RebuildSearchIndex],@RebuildTotalCompleted=RebuildTotalCompleted > from [Metadata_ElasticSearchRefreshTime] where DataObjectID=@EntityName > > IF @lastrefreshdatetime is null > BEGIN > SET @RebuildSearchIndex=1 > SET @InProcess = 0 > INSERT INTO [dbo].[Metadata_ElasticSearchRefreshTime] > ([DataObjectID] > ,[LastRefreshDateTime] > ,[RebuildSearchIndex] > ,[NumberofExecutionsSinceRebuild] > ,[RebuildMonth] > ,[RebuildYear] > ,[RebuildTotalCompleted] > ,[InProcess]) > VALUES > (@EntityName > ,@SprocEntryTime > ,@RebuildSearchIndex > ,0 > ,MONTH(@MaxDate) > ,YEAR(@MaxDate) > ,0 > ,@InProcess) > END > -- Begin statements > > IF @EntityName = 'Contact' and @RebuildSearchIndex = 1 > BEGIN > IF @InProcess=0 > BEGIN > --First Pass here Direction is descending > SET @InProcess=1 > SET @RebuildMonth = Month(@MaxDate) > SET @RebuildYear = Year(@MaxDate) > END > ELSE IF @InProcess=1 > BEGIN > --If you hit January drop to December of the prior year > IF @RebuildMonth=1 > BEGIN > SET @RebuildMonth = 12 > SET @RebuildYear = @RebuildYear-1 > END > ELSE > BEGIN > SET @RebuildMonth =@RebuildMonth-1 > END > END > > SELECT Entity.id as _id,Entity.* > FROM [MyDB].[dbo].[Entity] WITH (NOLOCK) > and Month(contact.CreatedDate)=@RebuildMonth and Year(contact.CreatedDate) > = @RebuildYear > > UPDATE [dbo].[Metadata_ElasticSearchRefreshTime] > SET [DataObjectID] = @EntityName > ,[LastRefreshDateTime] = @SprocEntryTime > ,[RebuildSearchIndex] = CASE WHEN ((@RebuildTotalCompleted + > @@ROWCOUNT)<@TotalRows) THEN 1 ELSE 0 END > ,[RebuildMonth] = @RebuildMonth > ,[RebuildYear] = @RebuildYear > ,[PreviousRunTime] = @lastrefreshdatetime > ,[NumberofExecutionsSinceRebuild] = 0 > ,[InProcess] = CASE WHEN ((@RebuildTotalCompleted + > @@ROWCOUNT)<@TotalRows) THEN 1 ELSE 0 END > ,[RebuildTotalCompleted] = CASE WHEN (@RebuildTotalCompleted=0) THEN > @@ROWCOUNT ELSE @RebuildTotalCompleted+@@ROWCOUNT END > WHERE DataObjectID=@EntityName > END > ELSE IF @EntityName = 'Entity' > BEGIN > UPDATE [dbo].[Metadata_ElasticSearchRefreshTime] > SET [DataObjectID] = @EntityName > ,[LastRefreshDateTime] = @SprocEntryTime > ,[RebuildSearchIndex] = 0 > ,PreviousRunTime = @lastrefreshdatetime > ,NumberofExecutionsSinceRebuild = NumberofExecutionsSinceRebuild+1 > WHERE DataObjectID=@EntityName > > SELECT Entity.id as _id,Entity.* > FROM [MyDB].[dbo].[Entity] WITH (NOLOCK) > and ([Contact].LastModifiedDate >= @lastrefreshdatetime or > [Contact].CreatedDate >=@lastrefreshdatetime) > END > END > > Finally in Sense I create the river (i used integrated security): > > PUT _river/Entity/_meta > { > "type":"jdbc", > "jdbc": { > "url":"jdbc:sqlserver://dbserver.mydomain.com:1433 > ;databaseName=MyDB;integratedSecurity=true;", > "sql":{ > "callable" : true, > "statement" : "{call ESTransport(?)}", > "parameter" : "Entity" > }, > "schedule":"0/30 0-59 0-23 ? * *", > "autocommit":true, > "index":"jdbc", > "type":"Entity" > } > } > > > > On Wednesday, April 29, 2015 at 7:42:33 AM UTC-5, Jörg Prante wrote: >> >> Nice work, can you share the recipe with the community? >> >> I could post it on the JDBC plugin wiki >> >> Jörg >> >> On Wed, Apr 29, 2015 at 1:56 PM, GWired <garrett...@gmail.com> wrote: >> >>> My theory is that i was overloading my ES VM's on initial loads or when >>> doing large loads. >>> >>> My cpu would jump to 99% and during the pulls it would fail silently and >>> the river thought it got all the documents but didn't. >>> >>> I have since rewritten my rivers using Stored procedures instead and it >>> pulls much faster and CPU doesn't go to high. >>> >>> The procedure keeps track of the created_at and updated_at for me >>> instead of using the column strategy. I also chunked my initial calls into >>> logical batches. Mine happen to be date based by month. I couldn't figure >>> out a way that made sense using row sizes or anything and month seemed just >>> as good as any and it works. >>> >>> This also solved my problem of when / ES is turned off. It now will >>> just start back up where it last left off because SQL server is storing >>> that info instead of ES. I also made it so I can rebuild at will. >>> >>> Thanks Garrett >>> >>> >>> >>> On Thursday, April 23, 2015 at 2:48:03 AM UTC-4, Jörg Prante wrote: >>>> >>>> There are log messages at ES cluster side, you should look there why >>>> bulk indexing failed. >>>> >>>> Jörg >>>> >>>> On Thu, Apr 23, 2015 at 5:45 AM, GWired <garrett...@gmail.com> wrote: >>>> >>>>> Found this in the logs: >>>>> >>>>> [2015-04-22 22:01:25,063][ERROR][river.jdbc.BulkNodeClient] bulk [15] >>>>> failed with 945 failed items, failure message = failure in bulk execution: >>>>> >>>>> >>>>> >>>>> On Wednesday, April 22, 2015 at 7:53:25 PM UTC-5, GWired wrote: >>>>>> >>>>>> Hi All, >>>>>> >>>>>> I've just been informed that i'm off by up to 100k records or so in >>>>>> my jdbc river fed index. >>>>>> >>>>>> I am using the column strategy using a createddate and lastmodified >>>>>> date. >>>>>> >>>>>> Kibana is reporting an entirely different # than what i see reported >>>>>> in the DB.. >>>>>> >>>>>> Table A has 978634 in SQL, 934646 shown in Kibana. >>>>>> Table B has 957327 in SQL, 876725 shown in Kibana. >>>>>> Table C has 312826 in SQL, 238534 shown in Kibana >>>>>> >>>>>> I see in the ES logs >>>>>> >>>>>> Table A metrics: 979044 rows, >>>>>> Table B metrics: 957591 rows >>>>>> Table C metrics: 312827 rows, >>>>>> >>>>>> These are the right numbers...well at least closer to right. >>>>>> >>>>>> But if i do this using Sense: >>>>>> >>>>>> GET jdbc/mytable/_count?q=* >>>>>> >>>>>> It returns the same # as Kibana is return. >>>>>> >>>>>> This erring version is running on ES 1.5.1 with Kibana version 3.0 >>>>>> >>>>>> On another server with ES 1.5.0 and Kibana 3.0 it is working just >>>>>> fine #'s match up. >>>>>> >>>>>> Any ideas? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to elasticsearc...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/elasticsearch/5561753d-9553-4bc5-bea2-102b7e030396%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearc...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/5c2016b5-ef5f-4f54-ac37-b2991eba0775%40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/5c2016b5-ef5f-4f54-ac37-b2991eba0775%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/ca6554c1-5cd9-46e2-af65-6c12a2bd23f9%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/ca6554c1-5cd9-46e2-af65-6c12a2bd23f9%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGkLaqcjFNeqwbuxNPk7NjUWJanr%3D5DDez7YFL-4KPaVw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.