回复:yarn-session模式通过python api消费kafka数据报错

2019-12-09 文章 改改
hi wei zhang,
非常感谢,终于跑起来了,感谢你这么耐心的指导初学者。
我当时从编译的源码中拷贝的时flink-sql-connector-kafka 
而不是flink-sql-connector-kafka-0.11,所以版本不必配。
再次感谢,祝工作顺利。
--
发件人:Wei Zhong 
发送时间:2019年12月10日(星期二) 10:23
收件人:改改 
抄 送:user-zh 
主 题:Re: yarn-session模式通过python api消费kafka数据报错

Hi 改改,

看现在的报错,可能是kafka版本不匹配,你需要放入lib目录的kafka connector 
需要是0.11版本的,即flink-sql-connector-kafka-0.11_2.11-1.9.1.jar


在 2019年12月10日,10:06,改改  写道:
HI Wei Zhong ,
 感谢您的回复,flink的lib目录下已经放了kafka connector的jar包的,我的flink/lib目录下文件目录如下:

 <5600791664319709.png>

另外我的集群环境如下:
 java :1.8.0_231
 flink: 1.9.1
 Python 3.6.9
 Hadoop 3.1.1.3.1.4.0-315

昨天试了下用python3.6 执行,依然是报错的,报错如下:

[root@hdp02 data_team_workspace]# /opt/flink-1.9.1/bin/flink run -py 
tumble_window.py
Starting execution of program
Traceback (most recent call last):
  File 
"/tmp/pyflink/3fb6ccfd-482f-4426-859a-ebe003e14769/pyflink.zip/pyflink/util/exceptions.py",
 line 147, in deco
  File 
"/tmp/pyflink/3fb6ccfd-482f-4426-859a-ebe003e14769/py4j-0.10.8.1-src.zip/py4j/protocol.py",
 line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling 
o42.registerTableSource.
: org.apache.flink.table.api.TableException: findAndCreateTableSource failed.
 at 
org.apache.flink.table.factories.TableFactoryUtil.findAndCreateTableSource(TableFactoryUtil.java:67)
 at 
org.apache.flink.table.factories.TableFactoryUtil.findAndCreateTableSource(TableFactoryUtil.java:54)
 at 
org.apache.flink.table.descriptors.ConnectTableDescriptor.registerTableSource(ConnectTableDescriptor.java:69)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
 at 
org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
 at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
 at 
org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
 at 
org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
 at 
org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
 at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.table.api.NoMatchingTableFactoryException: Could 
not find a suitable table factory for 
'org.apache.flink.table.factories.TableSourceFactory' in
the classpath.
Reason: No context matches.
The following properties are requested:
connector.properties.0.key=zookeeper.connect
connector.properties.0.value=hdp03:2181
connector.properties.1.key=bootstrap.servers
connector.properties.1.value=hdp02:6667
connector.property-version=1
connector.startup-mode=earliest-offset
connector.topic=user_01
connector.type=kafka
connector.version=0.11
format.fail-on-missing-field=true
format.json-schema={  type: 'object',  properties: {col1: {  type: 
'string'},col2: {  type: 'string'},col3: {  type: 
'string'},time: {  type: 'string',  format: 'date-time'}  }}
format.property-version=1
format.type=json
schema.0.name=rowtime
schema.0.rowtime.timestamps.from=time
schema.0.rowtime.timestamps.type=from-field
schema.0.rowtime.watermarks.delay=6
schema.0.rowtime.watermarks.type=periodic-bounded
schema.0.type=TIMESTAMP
schema.1.name=col1
schema.1.type=VARCHAR
schema.2.name=col2
schema.2.type=VARCHAR
schema.3.name=col3
schema.3.type=VARCHAR
update-mode=append
The following factories have been considered:
org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactory
org.apache.flink.formats.csv.CsvRowFormatFactory
org.apache.flink.addons.hbase.HBaseTableFactory
org.apache.flink.api.java.io.jdbc.JDBCTableSourceSinkFactory
org.apache.flink.formats.json.JsonRowFormatFactory
org.apache.flink.table.catalog.GenericInMemoryCatalogFactory
org.apache.flink.table.sources.CsvBatchTableSourceFactory
org.apache.flink.table.sources.CsvAppendTableSourceFactory
org.apache.flink.table.sinks.CsvBatchTableSinkFactory
org.apache.flink.table.sinks.CsvAppendTableSinkFactory
org.apache.flink.table.planner.StreamPlannerFactory
org.apache.flink.table.executor.StreamExecutorFactory
org.apache.flink.table.planner.delegation.BlinkPlannerFactory
org.apache.flink.table.planner.delegation.BlinkExecutorFactory
 at 
org.apache.flink.table.factories.TableFactoryService.filterByContext(TableFactoryService.java:283)
 at 
org.apache.flink.table.factories.TableFactoryService.filter(TableFactoryService.java:191)
 at 
org.apache.flink.table.factories.TableFactoryService.findSingleInternal(TableFactoryService.java:144)
▽
 at 

回复:yarn-session模式通过python api消费kafka数据报错

2019-12-09 文章 改改
HI Wei Zhong ,
 感谢您的回复,flink的lib目录下已经放了kafka connector的jar包的,我的flink/lib目录下文件目录如下:

 

另外我的集群环境如下:
 java :1.8.0_231
 flink: 1.9.1
 Python 3.6.9
 Hadoop 3.1.1.3.1.4.0-315

昨天试了下用python3.6 执行,依然是报错的,报错如下:

[root@hdp02 data_team_workspace]# /opt/flink-1.9.1/bin/flink run -py 
tumble_window.py
Starting execution of program
Traceback (most recent call last):
  File 
"/tmp/pyflink/3fb6ccfd-482f-4426-859a-ebe003e14769/pyflink.zip/pyflink/util/exceptions.py",
 line 147, in deco
  File 
"/tmp/pyflink/3fb6ccfd-482f-4426-859a-ebe003e14769/py4j-0.10.8.1-src.zip/py4j/protocol.py",
 line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling 
o42.registerTableSource.
: org.apache.flink.table.api.TableException: findAndCreateTableSource failed.
 at 
org.apache.flink.table.factories.TableFactoryUtil.findAndCreateTableSource(TableFactoryUtil.java:67)
 at 
org.apache.flink.table.factories.TableFactoryUtil.findAndCreateTableSource(TableFactoryUtil.java:54)
 at 
org.apache.flink.table.descriptors.ConnectTableDescriptor.registerTableSource(ConnectTableDescriptor.java:69)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
 at 
org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
 at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
 at 
org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
 at 
org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
 at 
org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
 at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.table.api.NoMatchingTableFactoryException: Could 
not find a suitable table factory for 
'org.apache.flink.table.factories.TableSourceFactory' in
the classpath.
Reason: No context matches.
The following properties are requested:
connector.properties.0.key=zookeeper.connect
connector.properties.0.value=hdp03:2181
connector.properties.1.key=bootstrap.servers
connector.properties.1.value=hdp02:6667
connector.property-version=1
connector.startup-mode=earliest-offset
connector.topic=user_01
connector.type=kafka
connector.version=0.11
format.fail-on-missing-field=true
format.json-schema={  type: 'object',  properties: {col1: {  type: 
'string'},col2: {  type: 'string'},col3: {  type: 
'string'},time: {  type: 'string',  format: 'date-time'}  }}
format.property-version=1
format.type=json
schema.0.name=rowtime
schema.0.rowtime.timestamps.from=time
schema.0.rowtime.timestamps.type=from-field
schema.0.rowtime.watermarks.delay=6
schema.0.rowtime.watermarks.type=periodic-bounded
schema.0.type=TIMESTAMP
schema.1.name=col1
schema.1.type=VARCHAR
schema.2.name=col2
schema.2.type=VARCHAR
schema.3.name=col3
schema.3.type=VARCHAR
update-mode=append
The following factories have been considered:
org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactory
org.apache.flink.formats.csv.CsvRowFormatFactory
org.apache.flink.addons.hbase.HBaseTableFactory
org.apache.flink.api.java.io.jdbc.JDBCTableSourceSinkFactory
org.apache.flink.formats.json.JsonRowFormatFactory
org.apache.flink.table.catalog.GenericInMemoryCatalogFactory
org.apache.flink.table.sources.CsvBatchTableSourceFactory
org.apache.flink.table.sources.CsvAppendTableSourceFactory
org.apache.flink.table.sinks.CsvBatchTableSinkFactory
org.apache.flink.table.sinks.CsvAppendTableSinkFactory
org.apache.flink.table.planner.StreamPlannerFactory
org.apache.flink.table.executor.StreamExecutorFactory
org.apache.flink.table.planner.delegation.BlinkPlannerFactory
org.apache.flink.table.planner.delegation.BlinkExecutorFactory
 at 
org.apache.flink.table.factories.TableFactoryService.filterByContext(TableFactoryService.java:283)
 at 
org.apache.flink.table.factories.TableFactoryService.filter(TableFactoryService.java:191)
 at 
org.apache.flink.table.factories.TableFactoryService.findSingleInternal(TableFactoryService.java:144)
▽
 at 
org.apache.flink.table.factories.TableFactoryService.find(TableFactoryService.java:97)
 at 
org.apache.flink.table.factories.TableFactoryUtil.findAndCreateTableSource(TableFactoryUtil.java:64)
 ... 13 more
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/python3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
  File "/usr/python3.6/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
  File