[ 
https://issues.apache.org/jira/browse/GOBBLIN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492905#comment-17492905
 ] 

Kengo Seki commented on GOBBLIN-1367:
-------------------------------------

Hi [~shirshanka], are you going to address this issue?
I've already submitted a PR on it, so I'd be glad if you review it.
https://github.com/apache/gobblin/pull/3209


> Fix PostgresqlExtractor's unnecessary string replacement
> --------------------------------------------------------
>
>                 Key: GOBBLIN-1367
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1367
>             Project: Apache Gobblin
>          Issue Type: Bug
>          Components: gobblin-sql
>            Reporter: Kengo Seki
>            Assignee: Shirshanka Das
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> I tried to run PostgreSQL Source/Extractor with the following minimal config,
> {code}
> $ psql
> psql (12.5 (Ubuntu 12.5-0ubuntu0.20.04.1))
> Type "help" for help.
> sekikn=# \c
> You are now connected to database "sekikn" as user "sekikn".
> sekikn=# \d
>  public | t    | table | sekikn
> sekikn=# \d t
>  c      | character varying |           |          | 
> {code}
> {code}
> $ cat ~/postgres_to_console.job 
> source.class=org.apache.gobblin.source.extractor.extract.jdbc.PostgresqlSource
> source.conn.driver=org.postgresql.Driver
> source.conn.username=sekikn
> source.conn.password=********
> source.conn.host=localhost
> source.conn.port=5432
> source.conn.database=sekikn
> source.querybased.schema=public
> source.entity=t
> extract.table.type=SNAPSHOT_APPEND
> writer.builder.class=org.apache.gobblin.writer.ConsoleWriterBuilder
> data.publisher.type=org.apache.gobblin.publisher.NoopPublisher
> {code}
> but came across the following error.
> {code}
> $ bin/gobblin cli run -jobName postgres_to_console -jobFile 
> ~/postgres_to_console.job
> (snip)
> 2021-01-21 07:45:40 JST INFO  [gobblin-instance-driver] 
> org.apache.gobblin.source.jdbc.JdbcExtractor [public_t_1611182740226_0] - 
> Extract metadata using JDBC
> 2021-01-21 07:45:40 JST INFO  [gobblin-instance-driver] 
> org.apache.gobblin.source.jdbc.JdbcExtractor [public_t_1611182740226_0] - 
> Executing query:select col.column_name, col.data_type, case when 
> CHARACTER_OCTET_LENGTH is null then 0 else 0 end as length, case when 
> NUMERIC_PRECISION is null then 0 else NUMERIC_PRECISION end as precesion, 
> case when NUMERIC_SCALE is null then 0 else NUMERIC_SCALE end as scale, case 
> when is_nullable='NO' then 'false' else 'true' end as nullable, '' as format, 
> '' as comment from information_schema.COLUMNS col WHERE 
> upper(col.table_name)=upper(?) AND upper(col.table_schema)=upper(?) order by 
> col.ORDINAL_POSITION
> 2021-01-21 07:45:40 JST INFO  [gobblin-instance-driver] 
> org.apache.gobblin.source.jdbc.JdbcExtractor [public_t_1611182740226_0] - 
> Schema:[{"columnName":"c","dataType":{"type":"string"},"isWaterMark":false,"primaryKey":0,"length":0,"precision":0,"scale":0,"isNullable":true,"format":"","comment":"","isUnique":false}]
> 2021-01-21 07:45:40 JST INFO  [gobblin-instance-driver] 
> org.apache.gobblin.source.jdbc.JdbcExtractor [public_t_1611182740226_0] - 
> Extract query: SELECT c FROM public.t where ('$WATERMARK')
> 2021-01-21 07:45:40 JST INFO  [gobblin-instance-driver] 
> org.apache.gobblin.source.jdbc.JdbcExtractor [public_t_1611182740226_0] - Get 
> source record count using JDBC
> 2021-01-21 07:45:40 JST INFO  [gobblin-instance-driver] 
> org.apache.gobblin.source.jdbc.JdbcExtractor [public_t_1611182740226_0] - 
> Executing query:SELECT COUNT(1) FROM publiCOUNT(1).t where (1=1)
> 2021-01-21 07:45:40 JST ERROR [gobblin-instance-driver] 
> org.apache.gobblin.source.jdbc.JdbcExtractor [public_t_1611182740226_0] - 
> Failed to execute sql:SELECT COUNT(1) FROM publiCOUNT(1).t where (1=1) 
> ;error-ERROR: syntax error at or near "."
>   Position: 35
> org.postgresql.util.PSQLException: ERROR: syntax error at or near "."
>   Position: 35
>       at 
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2477)
>       at 
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2190)
>       at 
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:300)
>       at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)
>       at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)
>       at 
> org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:301)
>       at 
> org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:287)
>       at 
> org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:264)
>       at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:260)
>       at 
> org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
>       at 
> org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
>       at 
> org.apache.gobblin.source.jdbc.JdbcExtractor.executeSql(JdbcExtractor.java:666)
>       at 
> org.apache.gobblin.source.jdbc.JdbcExtractor.getSourceCount(JdbcExtractor.java:797)
>       at 
> org.apache.gobblin.source.extractor.extract.QueryBasedExtractor.build(QueryBasedExtractor.java:326)
>       at 
> org.apache.gobblin.source.extractor.extract.jdbc.PostgresqlSource.getExtractor(PostgresqlSource.java:47)
>       at 
> org.apache.gobblin.runtime.TaskContext.getExtractor(TaskContext.java:120)
>       at org.apache.gobblin.runtime.Task.<init>(Task.java:196)
>       at 
> org.apache.gobblin.runtime.GobblinMultiTaskAttempt.createTaskRunnable(GobblinMultiTaskAttempt.java:485)
>       at 
> org.apache.gobblin.runtime.GobblinMultiTaskAttempt.access$100(GobblinMultiTaskAttempt.java:86)
>       at 
> org.apache.gobblin.runtime.GobblinMultiTaskAttempt$2.call(GobblinMultiTaskAttempt.java:511)
>       at 
> org.apache.gobblin.runtime.GobblinMultiTaskAttempt$2.call(GobblinMultiTaskAttempt.java:505)
>       at 
> com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
>       at com.github.rholder.retry.Retryer.call(Retryer.java:160)
>       at 
> org.apache.gobblin.runtime.GobblinMultiTaskAttempt.createTaskWithRetry(GobblinMultiTaskAttempt.java:505)
>       at 
> org.apache.gobblin.runtime.GobblinMultiTaskAttempt.runWorkUnits(GobblinMultiTaskAttempt.java:421)
>       at 
> org.apache.gobblin.runtime.GobblinMultiTaskAttempt.run(GobblinMultiTaskAttempt.java:157)
>       at 
> org.apache.gobblin.runtime.GobblinMultiTaskAttempt.runAndOptionallyCommitTaskAttempt(GobblinMultiTaskAttempt.java:521)
>       at 
> org.apache.gobblin.runtime.GobblinMultiTaskAttempt.runWorkUnits(GobblinMultiTaskAttempt.java:563)
>       at 
> org.apache.gobblin.runtime.local.LocalJobLauncher.runWorkUnitStream(LocalJobLauncher.java:156)
>       at 
> org.apache.gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:508)
>       at 
> org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:164)
>       at 
> org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:152)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at java.lang.Thread.run(Thread.java:748)
> 2021-01-21 07:45:40 JST ERROR [gobblin-instance-driver] 
> org.apache.gobblin.source.extractor.extract.jdbc.PostgresqlSource 
> [public_t_1611182740226_0] - Failed to prepare extractor: error - Failed to 
> get record count; error - Failed to get source record count using JDBC; error 
> - Failed to get source record count from database; error - null
> {code}
> This is because the target schema and table (publi"c".t) contains the name of 
> column (c), so it was replaced wrongly. PostgresqlExtractor should replace 
> only the first occurrence of the target substring here.
>  
> [https://github.com/apache/incubator-gobblin/blob/release-0.15.0/gobblin-modules/gobblin-sql/src/main/java/org/apache/gobblin/source/jdbc/PostgresqlExtractor.java#L134]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to